Skip to content

WizardLM: 7B LLaMA model with magical performance

WizardLM is a large language model with excellent conversational capability. It is a LLaMA model with 7 billion parameters fine-tuned with a novel data generation method.

As of writing, WizardLM is considered to be one of the best 7B LLaMA models.

In this post, you will learn

  • What WizardLM is
  • How to install and run WizardLM on Mac
  • How to install and run WizardLM on Windows

What is WizardLM?

WizardLM is a model released in April 2023. It was fine-tuned with a large amount of instruction-following conversations with varying difficulties. The novelty of this model is using a large language model to generate training data automatically.


The WizardLM model was trained with 70k computer-generated instructions with a new method called Evol-Instruct. The method produces instructions with varying levels of difficulty.

Evol-Instruct expands a prompt with these five operations

  • Add constraints
  • Deepening
  • Concretizing
  • Increase reasoning steps
  • Complicate input

These operations were applied sequentially to an initial instruction to make it more complex.

The expansions of instructions and responses are generated with ChatGPT.


The authors compared the performance of WizardLM with Alpaca 7B, Vicuna 7B, and ChatGPT. 10 people judged the responses of WizardLM and other models in five areas blindly: Relevance, knowledge, reasoning, calculation, and accuracy.

The results are:

  • WizardLM significantly outperforms Alpca and Vicuna.
  • The instructions generated by Evol-Instruct are superior to ShareGPT (used by Vicuna).
  • ChatGPT is better overall, but WizardLM excels in high-complexity questions.
Performance of WizardLM

Ongoing development

There’s an ongoing effort to further train the WizardLM model with 300k instructions.

Install and run WizardLM on Mac

There are two ways to run WizardLM on Mac: (1) llama.cpp and (2) text-generation-webui. You will see instructions for both.

llama.cpp (Mac)

We will use model weights from this repository. The following instruction is for installing the q4_0 4-bit quantization. There are several other quantized WizardLM models available in the repository.

Make sure you have installed llama.cpp before proceeding.

Step 1. Open Terminal App. Navigate to the llama.cpp directory. Create the model directory.

mkdir models/TheBloke_wizardLM-7B-GGML

Step 2. Download the model weights.

wget -P models/TheBloke_wizardLM-7B-GGML

Step 3. Run the WizardLM model.

The model follows the following dialog format.

Human: Hello.
AI: Hi.

So a reverse prompt Human: works beautifully.

The command for running WizardLM with llama.cpp is

./main -m models/TheBloke_wizardLM-7B-GGML/wizardLM-7B.ggml.q4_0.bin -t 4 -c 2048 -n 2048 --color -i --reverse-prompt 'Human:' -p 'Human:'

Text-generation-webui (Mac)

The following steps install the WizardLM model in text-generation-webui.

The following steps assume you followed this guide to install text-generation-webui. Improvise if otherwise!

Step 1. Open the Terminal App. Navigate to the text-generation-webui directory. Create the model directory.

mkdir models/TheBloke_wizardLM-7B-GGML

Step 2. Download the model weights.

wget -P models/TheBloke_wizardLM-7B-GGML

Step 3. Start text-generation-webui.

Activate the virtual environment.

source venv/bin/activate

Start the webui.

python --chat

Follow the local URL to start the webui.

Step 4. Navigate to the Model page. Select TheBloke_wizardLM-7B-GGM model in the model dropdown manual.

You should see a confirmation message on bottom right.

Now WizardLM is ready to talk to you on the text-generation page!

Install and run WizardLM on Windows

You should have text-generation-webui installed on your Windows PC. Follow this guide to install if not.

We will use the 4-bit GPTQ model from this repository. It needs to run on a GPU.

Step 1. Start text-generation-webui normally.

Step 2. Navigate to the Model page. In the Download custom model or LoRA text box, enter


Press the Download button.

Wait for the download to complete.

Step 3. Click the refresh icon next to the Model dropdown menu.

Step 4. In the Model dropdown menu, select TheBloke_wizardLM-7B-GPTQ. Ignore the error message.

Step 5. Fill in the following values in the GPTQ parameters section.

  • wbits: 4
  • Groupsize: 128
  • model_type: llama

Step 6. Click Save settings for this model, so that you don’t need to put in these values next time you use this model.

The automatic paramater loading will only be effective after you restart the GUI. You don’t need to restart now.

Step 7. Click Reload the model.

Now you can talk to WizardLM on the text-generation page.

If the model outputs gibberish, that could be because the GUI is loading the incorrect file.

Delete the file wizardLM-7B-GPTQ-4bit.latest.act-order.safetensors in the models\TheBloke_wizardLM-7B-GPTQ folder to ensure loading the correct file.

8 thoughts on “WizardLM: 7B LLaMA model with magical performance”

  1. Unfortuneately I get ModuleNotFoundError: No module named ‘llama_inference_offload’, even though I’ve tried downloading it from various models

Leave a Reply

Your email address will not be published. Required fields are marked *