Vicuna is one of the best language models for running “ChatGPT” locally. It is purportedly 90%* as good as ChatGPT 3.5. (I will explain what that means in the next section.) What’s more, its small size allows you to run the model on your own PC with reasonable resources.
In this post, you will learn
- What Vicuna model is.
- How to install and use Vicuna models in lamma.cpp on Mac
- How to install and use Vicuna models in text-generation-webui on Mac
Contents
What is Vicuna model?
Vicuna model is trained by fine-tuning the LLaMA model, the foundational model released by Meta (Facebook). Like ChatGPT, it excels at conversational style interactions.
It comes in two sizes
- 7B
- 13B
The larger 13B model is better but requires more resources to run.
Training
The model was trained by a team of researchers from UC Berkeley, CMU, Stanford, and UC San Diego.
It was trained with user-contributed ChatGPT conversations. So you can expect its behavior mimics ChatGPT. Precisely, it is trained with 70,000 ChatGPT conversations users shared on ShareGPT.com.
It only costed $140 to train the 7B model and $300 to train the 13B model.
Performance
The authors used an interesting method to evaluate the model’s performance: Using GPT-4 as the judge. They asked GPT-4 to generate some challenging questions and let Vicuna and some other best language models answer them.
They then ask GPT-4 to evaluate the quality of the answers in different aspects, such as helpfulness and accuracy.
Here’s the result for comparing LLaMA, Alpaca, Bard, and ChatGPT. In the eyes of GPT-4, Vicuna is almost as good as ChatGPT, beating LLaMA and Alpaca by a large margin.


Of course, we need to take this result with a grain of salt. GPT-4 is not proven to be good at judging model performance. At least for the time being, the best judges can only be found among humans.
Installing and using Vicuna model
Two popular ways to run local language models on Mac are
You will find the instructions for both.
You will be given instructions to install both 7B and 13B. If your machine can run both with reasonable speed, you should run the 13B model because it is better.
Fall back to the 7B model if the speed of 13B is intolerable.
llama.cpp
llama.cpp is developed for running LLaMA language models on Macbooks. Vicuna is a fine-tuned LLaMA model (that is, the architecture is the same but the weight is slightly different) so here we go.
The following instruction assumes you have installed llama.cpp by following this tutorial.
Installing Vicuna models on llama.cpp
You will need model files in ggml
format.
The following steps install both 7B and 13B models. Skip certain steps if you only want one.
Step 1. Open the Terminal App. Go to the llama.cpp folder.
cd ~/llama.cpp
(Adjust accordingly if you have installed in a different folder)
Step 2 (7B model): Create a new folder for the 7B model.
mkdir models/chharlesonfire_ggml-vicuna-7b-4bit
Step 3 (7B model): Download the 7B model.
wget https://huggingface.co/chharlesonfire/ggml-vicuna-7b-4bit/resolve/main/ggml-vicuna-7b-q4_0.bin -P models/chharlesonfire_ggml-vicuna-7b-4bit
Step 4 (13B model): Create a new folder for the 13B model.
mkdir models/eachadea_ggml-vicuna-13b-1.1
Step 5 (13B model): Download the 13B model.
wget https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-old-vic13b-q4_0.bin -P models/eachadea_ggml-vicuna-13b-1.1

Running Vicuna model in llama.cpp
The trick to getting sensible results from Vicuna models is to recognize that the model uses the following dialog format.
### Human: Are you ok?
### Assistant: Yes I am fine!
So you need to use the reverse prompt ### Human:
to stop the model in the interactive mode.
Run the following command to start chatting with the Vicuna 7B model.
./main -m models/chharlesonfire_ggml-vicuna-7b-4bit/ggml-vicuna-7b-q4_0.bin -t 4 -c 2048 -n 2048 --color -i --reverse-prompt '### Human:' -p '### Human:'
Run the following command to start chatting with the Vicuna 13B model.
./main -m models/eachadea_ggml-vicuna-13b-1.1/ggml-vic13b-q4_0.bin -t 4 -c 2048 -n 2048 --color -i --reverse-prompt '### Human:' -p '### Human:'
Now you can talk to Vicuna upon the ### Human:
prompt.

Press Ctrl+C once to interrupt Vicuna and say something.
Press Ctrl+C again to exit.
text-generation-webui
text-generation-webui is a nice user interface for using Vicuna models. It uses llama.cpp under the hood on Mac, where no GPU is available.
It uses the same model weights but the installation and setup are a bit different. Let’s walk through them.
Installing Vicuna models
The following instructions assume you followed this guide to install the webui.
Step 1. Open Terminal App. Go to your text-generation-webui installation folder.
cd ~/text-generation-webui
Step 2. Activate the virtual environment.
source venv/bin/activate
Step 3. Download the 7B model.
python download-model.py chharlesonfire/ggml-vicuna-7b-4bit
Step 4. Download the 13B model.
mkdir models/eachadea_ggml-vicuna-13b-1.1; wget https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vic13b-q4_0.bin -P models/eachadea_ggml-vicuna-13b-1.1
Running Vicuna models
Follow these steps to run Vicuna models in text-generation-webui.
Step 1. Activate the virtual environment if you have not done so already.
source venv/bin/activate
Step 2. Start the webui in chat mode
python server.py --chat --threads 4
Follow the local URL on a browser to start using the webui.
Step 3. Navigate to the Model page. Select the 7B or 13B model.

Step 4. Chat with Vicuna on the text generation page.

Additional Vicuna models
Vicuna-1.1-13B Free
Vicuna-1.1-13B Free is fine-tuned with an unfiltered ShareGPT dataset.
4-bit quantized ggml
weight for Mac: vicuna-13b-free-V4.3-q4_0.bin
Images are cool. Are they AI generated? Which app do you use?
Hi, these are generated with MidJourney.
Hi andrew it appears 13b is down again for me? I’m getting 404.
Ah, they renamed it. Corrected.
Perhaps I’m asking for too much, but if you could do a similar guide for mac for gpt4 x alpaca, that would be so awesome
Sure, stay tuned!
Thank you, thank you thank you. All the guides are for windows. This is a big help thanks.
Working! THANKS!
The link to the 13B model doesn’t seem to work?
Corrected. The repro owner renamed it. Thanks for reporting!