Skip to content

Local ChatGPT on Mac – How to install Vicuna language model (2 ways)

Vicuna is one of the best language models for running “ChatGPT” locally. It is purportedly 90%* as good as ChatGPT 3.5. (I will explain what that means in the next section.) What’s more, its small size allows you to run the model on your own PC with reasonable resources.

In this post, you will learn

  • What Vicuna model is.
  • How to install and use Vicuna models in lamma.cpp on Mac
  • How to install and use Vicuna models in text-generation-webui on Mac

What is Vicuna model?

Vicuna model is trained by fine-tuning the LLaMA model, the foundational model released by Meta (Facebook). Like ChatGPT, it excels at conversational style interactions.

It comes in two sizes

  • 7B
  • 13B

The larger 13B model is better but requires more resources to run.


The model was trained by a team of researchers from UC Berkeley, CMU, Stanford, and UC San Diego.

It was trained with user-contributed ChatGPT conversations. So you can expect its behavior mimics ChatGPT. Precisely, it is trained with 70,000 ChatGPT conversations users shared on

It only costed $140 to train the 7B model and $300 to train the 13B model.


The authors used an interesting method to evaluate the model’s performance: Using GPT-4 as the judge. They asked GPT-4 to generate some challenging questions and let Vicuna and some other best language models answer them.

They then ask GPT-4 to evaluate the quality of the answers in different aspects, such as helpfulness and accuracy.

Here’s the result for comparing LLaMA, Alpaca, Bard, and ChatGPT. In the eyes of GPT-4, Vicuna is almost as good as ChatGPT, beating LLaMA and Alpaca by a large margin.

Vicuna model comparison
GPT-4’s judgment. (source: Vicuna model page)
Relative scores of various models. Vicuna is almost as good as ChatGPT. (source: Vicuna model page)

Of course, we need to take this result with a grain of salt. GPT-4 is not proven to be good at judging model performance. At least for the time being, the best judges can only be found among humans.

Installing and using Vicuna model

Two popular ways to run local language models on Mac are

  1. llama.cpp
  2. text-generation-webui

You will find the instructions for both.

You will be given instructions to install both 7B and 13B. If your machine can run both with reasonable speed, you should run the 13B model because it is better.

Fall back to the 7B model if the speed of 13B is intolerable.


llama.cpp is developed for running LLaMA language models on Macbooks. Vicuna is a fine-tuned LLaMA model (that is, the architecture is the same but the weight is slightly different) so here we go.

The following instruction assumes you have installed llama.cpp by following this tutorial.

Installing Vicuna models on llama.cpp

You will need model files in ggml format.

The following steps install both 7B and 13B models. Skip certain steps if you only want one.

Step 1. Open the Terminal App. Go to the llama.cpp folder.

cd ~/llama.cpp

(Adjust accordingly if you have installed in a different folder)

Step 2 (7B model): Create a new folder for the 7B model.

mkdir models/chharlesonfire_ggml-vicuna-7b-4bit

Step 3 (7B model): Download the 7B model.

wget -P models/chharlesonfire_ggml-vicuna-7b-4bit

Step 4 (13B model): Create a new folder for the 13B model.

mkdir models/eachadea_ggml-vicuna-13b-1.1

Step 5 (13B model): Download the 13B model.

wget -P models/eachadea_ggml-vicuna-13b-1.1

Running Vicuna model in llama.cpp

The trick to getting sensible results from Vicuna models is to recognize that the model uses the following dialog format.

### Human: Are you ok?
### Assistant: Yes I am fine!

So you need to use the reverse prompt ### Human: to stop the model in the interactive mode.

Run the following command to start chatting with the Vicuna 7B model.

./main -m models/chharlesonfire_ggml-vicuna-7b-4bit/ggml-vicuna-7b-q4_0.bin -t 4 -c 2048 -n 2048 --color -i --reverse-prompt '### Human:' -p '### Human:'

Run the following command to start chatting with the Vicuna 13B model.

./main -m models/eachadea_ggml-vicuna-13b-1.1/ggml-vic13b-q4_0.bin -t 4 -c 2048 -n 2048 --color -i --reverse-prompt '### Human:' -p '### Human:'

Now you can talk to Vicuna upon the ### Human: prompt.

Installing vicuna model on llama.cpp

Press Ctrl+C once to interrupt Vicuna and say something.

Press Ctrl+C again to exit.


text-generation-webui is a nice user interface for using Vicuna models. It uses llama.cpp under the hood on Mac, where no GPU is available.

It uses the same model weights but the installation and setup are a bit different. Let’s walk through them.

Installing Vicuna models

The following instructions assume you followed this guide to install the webui.

Step 1. Open Terminal App. Go to your text-generation-webui installation folder.

cd ~/text-generation-webui

Step 2. Activate the virtual environment.

source venv/bin/activate

Step 3. Download the 7B model.

python chharlesonfire/ggml-vicuna-7b-4bit

Step 4. Download the 13B model.

mkdir models/eachadea_ggml-vicuna-13b-1.1; wget -P models/eachadea_ggml-vicuna-13b-1.1

Running Vicuna models

Follow these steps to run Vicuna models in text-generation-webui.

Step 1. Activate the virtual environment if you have not done so already.

source venv/bin/activate 

Step 2. Start the webui in chat mode

python --chat --threads 4

Follow the local URL on a browser to start using the webui.

Step 3. Navigate to the Model page. Select the 7B or 13B model.

Step 4. Chat with Vicuna on the text generation page.

Additional Vicuna models

Vicuna-1.1-13B Free

Vicuna-1.1-13B Free is fine-tuned with an unfiltered ShareGPT dataset.

4-bit quantized ggml weight for Mac: vicuna-13b-free-V4.3-q4_0.bin

11 thoughts on “Local ChatGPT on Mac – How to install Vicuna language model (2 ways)”

  1. Perhaps I’m asking for too much, but if you could do a similar guide for mac for gpt4 x alpaca, that would be so awesome

Leave a Reply

Your email address will not be published. Required fields are marked *