Vicuna is one of the best language models for running “ChatGPT” locally. It is purportedly 90%* as good as ChatGPT 3.5. (I will explain what that means in the next section.) What’s more, its small size allows you to run the model on your own PC with reasonable resources.
In this post, you will learn
- What Vicuna model is.
- How to install and use Vicuna models in lamma.cpp on Mac
- How to install and use Vicuna models in text-generation-webui on Mac
What is Vicuna model?
It comes in two sizes
The larger 13B model is better but requires more resources to run.
The model was trained by a team of researchers from UC Berkeley, CMU, Stanford, and UC San Diego.
It was trained with user-contributed ChatGPT conversations. So you can expect its behavior mimics ChatGPT. Precisely, it is trained with 70,000 ChatGPT conversations users shared on ShareGPT.com.
It only costed $140 to train the 7B model and $300 to train the 13B model.
The authors used an interesting method to evaluate the model’s performance: Using GPT-4 as the judge. They asked GPT-4 to generate some challenging questions and let Vicuna and some other best language models answer them.
They then ask GPT-4 to evaluate the quality of the answers in different aspects, such as helpfulness and accuracy.
Of course, we need to take this result with a grain of salt. GPT-4 is not proven to be good at judging model performance. At least for the time being, the best judges can only be found among humans.
Installing and using Vicuna model
Two popular ways to run local language models on Mac are
You will find the instructions for both.
You will be given instructions to install both 7B and 13B. If your machine can run both with reasonable speed, you should run the 13B model because it is better.
Fall back to the 7B model if the speed of 13B is intolerable.
llama.cpp is developed for running LLaMA language models on Macbooks. Vicuna is a fine-tuned LLaMA model (that is, the architecture is the same but the weight is slightly different) so here we go.
The following instruction assumes you have installed llama.cpp by following this tutorial.
Installing Vicuna models on llama.cpp
You will need model files in
The following steps install both 7B and 13B models. Skip certain steps if you only want one.
Step 1. Open the Terminal App. Go to the llama.cpp folder.
(Adjust accordingly if you have installed in a different folder)
Step 2 (7B model): Create a new folder for the 7B model.
Step 3 (7B model): Download the 7B model.
wget https://huggingface.co/chharlesonfire/ggml-vicuna-7b-4bit/resolve/main/ggml-vicuna-7b-q4_0.bin -P models/chharlesonfire_ggml-vicuna-7b-4bit
Step 4 (13B model): Create a new folder for the 13B model.
Step 5 (13B model): Download the 13B model.
wget https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-old-vic13b-q4_0.bin -P models/eachadea_ggml-vicuna-13b-1.1
Running Vicuna model in llama.cpp
The trick to getting sensible results from Vicuna models is to recognize that the model uses the following dialog format.
### Human: Are you ok? ### Assistant: Yes I am fine!
So you need to use the reverse prompt
### Human: to stop the model in the interactive mode.
Run the following command to start chatting with the Vicuna 7B model.
./main -m models/chharlesonfire_ggml-vicuna-7b-4bit/ggml-vicuna-7b-q4_0.bin -t 4 -c 2048 -n 2048 --color -i --reverse-prompt '### Human:' -p '### Human:'
Run the following command to start chatting with the Vicuna 13B model.
./main -m models/eachadea_ggml-vicuna-13b-1.1/ggml-vic13b-q4_0.bin -t 4 -c 2048 -n 2048 --color -i --reverse-prompt '### Human:' -p '### Human:'
Now you can talk to Vicuna upon the
### Human: prompt.
Press Ctrl+C once to interrupt Vicuna and say something.
Press Ctrl+C again to exit.
text-generation-webui is a nice user interface for using Vicuna models. It uses llama.cpp under the hood on Mac, where no GPU is available.
It uses the same model weights but the installation and setup are a bit different. Let’s walk through them.
Installing Vicuna models
The following instructions assume you followed this guide to install the webui.
Step 1. Open Terminal App. Go to your text-generation-webui installation folder.
Step 2. Activate the virtual environment.
Step 3. Download the 7B model.
python download-model.py chharlesonfire/ggml-vicuna-7b-4bit
Step 4. Download the 13B model.
mkdir models/eachadea_ggml-vicuna-13b-1.1; wget https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vic13b-q4_0.bin -P models/eachadea_ggml-vicuna-13b-1.1
Running Vicuna models
Follow these steps to run Vicuna models in text-generation-webui.
Step 1. Activate the virtual environment if you have not done so already.
Step 2. Start the webui in chat mode
python server.py --chat --threads 4
Follow the local URL on a browser to start using the webui.
Step 3. Navigate to the Model page. Select the 7B or 13B model.
Step 4. Chat with Vicuna on the text generation page.
Additional Vicuna models
Vicuna-1.1-13B Free is fine-tuned with an unfiltered ShareGPT dataset.
ggml weight for Mac: vicuna-13b-free-V4.3-q4_0.bin