WizardLM is a large language model with excellent conversational capability. It is a LLaMA model with 7 billion parameters fine-tuned with a novel data generation method.
As of writing, WizardLM is considered to be one of the best 7B LLaMA models.
In this post, you will learn
- What WizardLM is
- How to install and run WizardLM on Mac
- How to install and run WizardLM on Windows
Contents
What is WizardLM?
WizardLM is a model released in April 2023. It was fine-tuned with a large amount of instruction-following conversations with varying difficulties. The novelty of this model is using a large language model to generate training data automatically.
Training
The WizardLM model was trained with 70k computer-generated instructions with a new method called Evol-Instruct. The method produces instructions with varying levels of difficulty.
Evol-Instruct expands a prompt with these five operations
- Add constraints
- Deepening
- Concretizing
- Increase reasoning steps
- Complicate input
These operations were applied sequentially to an initial instruction to make it more complex.
The expansions of instructions and responses are generated with ChatGPT.
Performance
The authors compared the performance of WizardLM with Alpaca 7B, Vicuna 7B, and ChatGPT. 10 people judged the responses of WizardLM and other models in five areas blindly: Relevance, knowledge, reasoning, calculation, and accuracy.
The results are:
- WizardLM significantly outperforms Alpca and Vicuna.
- The instructions generated by Evol-Instruct are superior to ShareGPT (used by Vicuna).
- ChatGPT is better overall, but WizardLM excels in high-complexity questions.

Ongoing development
There’s an ongoing effort to further train the WizardLM model with 300k instructions.
Install and run WizardLM on Mac
There are two ways to run WizardLM on Mac: (1) llama.cpp and (2) text-generation-webui. You will see instructions for both.
llama.cpp (Mac)
We will use model weights from this repository. The following instruction is for installing the q4_0
4-bit quantization. There are several other quantized WizardLM models available in the repository.
Make sure you have installed llama.cpp before proceeding.
Step 1. Open Terminal App. Navigate to the llama.cpp directory. Create the model directory.
mkdir models/TheBloke_wizardLM-7B-GGML
Step 2. Download the model weights.
wget https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggml.q4_0.bin -P models/TheBloke_wizardLM-7B-GGML
Step 3. Run the WizardLM model.
The model follows the following dialog format.
Human: Hello.
AI: Hi.
So a reverse prompt Human:
works beautifully.
The command for running WizardLM with llama.cpp is
./main -m models/TheBloke_wizardLM-7B-GGML/wizardLM-7B.ggml.q4_0.bin -t 4 -c 2048 -n 2048 --color -i --reverse-prompt 'Human:' -p 'Human:'

Text-generation-webui (Mac)
The following steps install the WizardLM model in text-generation-webui.
The following steps assume you followed this guide to install text-generation-webui. Improvise if otherwise!
Step 1. Open the Terminal App. Navigate to the text-generation-webui directory. Create the model directory.
mkdir models/TheBloke_wizardLM-7B-GGML
Step 2. Download the model weights.
wget https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggml.q4_0.bin -P models/TheBloke_wizardLM-7B-GGML
Step 3. Start text-generation-webui.
Activate the virtual environment.
source venv/bin/activate
Start the webui.
python server.py --chat
Follow the local URL to start the webui.
Step 4. Navigate to the Model page. Select TheBloke_wizardLM-7B-GGM model in the model dropdown manual.

You should see a confirmation message on bottom right.
Now WizardLM is ready to talk to you on the text-generation page!

Install and run WizardLM on Windows
You should have text-generation-webui installed on your Windows PC. Follow this guide to install if not.
We will use the 4-bit GPTQ model from this repository. It needs to run on a GPU.
Step 1. Start text-generation-webui normally.
Step 2. Navigate to the Model page. In the Download custom model or LoRA text box, enter
TheBloke/wizardLM-7B-GPTQ
Press the Download button.

Wait for the download to complete.
Step 3. Click the refresh icon next to the Model dropdown menu.
Step 4. In the Model dropdown menu, select TheBloke_wizardLM-7B-GPTQ
. Ignore the error message.
Step 5. Fill in the following values in the GPTQ parameters section.
- wbits: 4
- Groupsize: 128
- model_type: llama

Step 6. Click Save settings for this model, so that you don’t need to put in these values next time you use this model.
The automatic paramater loading will only be effective after you restart the GUI. You don’t need to restart now.
Step 7. Click Reload the model.
Now you can talk to WizardLM on the text-generation page.

If the model outputs gibberish, that could be because the GUI is loading the incorrect file.
Delete the file wizardLM-7B-GPTQ-4bit.latest.act-order.safetensors
in the models\TheBloke_wizardLM-7B-GPTQ
folder to ensure loading the correct file.
Why no Linux tutorials…
Of course, macos tutorials will work with some tweaking, but still…
I suppose Linux users can figure out things without much hand holding, like yourself đ
I still get gibberish even if I do the trick at the end of the tuto.
On Mac, exactly like in your guide
Hi, I followed my guide below and installed all over again but couldn’t reproduce your error message.
https://agi-sphere.com/install-textgen-webui-mac/
Your error message is from trying to load a GPTQ model file on Mac. GPTQ only works for cuda GPU so it cannot load. Mac users should use the following ggml file.
https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggml.q4_0.bin
on mac, exactly as in your guide
Unfortuneately I get ModuleNotFoundError: No module named âllama_inference_offloadâ, even though I’ve tried downloading it from various models
Are you using text-generation-webui on Mac or Windows? It has something to do with your installation.