gpt4-x-alpaca is a fine-tuned LLaMA model with 13 billion parameters. Many users have reported excellent conversational results. In this post, you will learn
- what gpt4-x-alpaca is
- How to run it on Mac
- How to run it on Windows
Contents
What is gpt4-x-alpaca?
gpt4-x-alpaca is a 13B LLaMA model that can follow instructions like answering questions.
gpt4-x-alpaca’s HuggingFace page states that it is based on the Alpaca 13B model, fine-tuned with GPT4 responses for 3 epochs.
That’s all the information I can find! This seems to be a community effort.
Users generally have good words about its performance. See the following discussions on performance.
Well, at least the performance should be similar to Vicuna 13B.
Install and run gpt4-x-alpaca on Mac
There are two ways to run WizardLM on Mac: (1) llama.cpp and (2) text-generation-webui. You will see instructions for both.
llama.cpp (Mac)
llama.cpp is a command line program for running LLaMA models. Make sure you have installed llama.cpp before proceeding.
We will use model weights from this repository. The following instruction is for installing the q4_1
4-bit quantized version. There are several other quantized WizardLM models available in the repository.
Step 1. Open Terminal App. Navigate to the llama.cpp
directory. Create the model folder.
mkdir models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g
Step 2. Download the model weights to the newly created folder.
wget https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g/resolve/main/gpt4-x-alpaca-13b-ggml-q4_1-from-gptq-4bit-128g/ggml-model-q4_1.bin -P models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g
Step 3. Run the model.
Since it is an Alpaca model, make sure to use the --instruct
parameter to turn on the instruct mode. The mode inserts ### instruction:
in the beginning and ### Response:
at the end of your prompt. Otherwise, you will get some nonsense.
./main -m models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.bin -t 4 -c 2048 -n 2048 --color -i --instruct

text-generation-webui (Mac)
The same repository can be used for text-generation-webui because it calls llama.cpp under the hood.
Make sure you have installed text-generation-webui before proceeding.
Step 1. Open Terminal App. Navigate to the text-generation-webui
directory. Create the model folder.
mkdir models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g
Step 2. Download the model.
wget https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g/resolve/main/gpt4-x-alpaca-13b-ggml-q4_1-from-gptq-4bit-128g/ggml-model-q4_1.bin -P models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g
Step 3. Open text-generation-webui normally.
If you follow the installation guide, first activate the virtual environment.
source ./venv/bin/activate
You should see the label venv
in front of the command prompt.
python --chat
Step 4. Go to Model Page. In the Model dropdown menu, select anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g.

You should see a confirmation message at the bottom right of the page saying the model was loaded successfully.
Now you can chat with gpt4-x-alpaca on the text-generation page.
Install and run gpt4-x-alpaca on Windows
We will use the 4-bit GPTQ model from this repository.
Systems requirements
You should have text-generation-webui installed on your Windows PC. Follow this guide to install.
You should have a windows PC with a GPU card.
Step-by-step guide (Windows with GPU)
Step 1. Start text-generation-webui normally.
Step 2. Navigate to the Model page. In the Download custom model or LoRA text box, enter
anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g
Press the Download button. Wait for the download to complete.

Step 3. Click the refresh icon next to the Model dropdown menu.
Step 4. Delete the following file and folder in the model’s folder. (We will only use the cuda version.)
gpt-x-alpaca-13b-native-4bit-128g.pt
gpt4-x-alpaca-13b-native-4bit-128g
(folder)
Step 5. In the Model dropdown menu, select anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g
. Ignore the error message.
Step 6. Fill in the following values in the GPTQ parameters section.
- wbits: 4
- Groupsize: 128
- model_type: llama
Step 7. Click Save settings for this model so that you don’t need to put in these values the next time you use this model.
The automatic parameter loading will only be effective after you restart the GUI. You don’t need to restart now.
Step 8. Click Reload the model.
Step 9. Start chatting with the model on the text-generation page.
Hi!
Thank you for your tutorials.
I’ve tried to install gpt4-x-alpaca locally on my MacBooks (Intel and M2) and the result always the same:
Traceback (most recent call last): File “/Users/shpilkin/Downloads/textAI/ooba_manual/text-generation-webui/server.py”, line 68, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “/Users/shpilkin/Downloads/textAI/ooba_manual/text-generation-webui/modules/models.py”, line 78, in load_model output = load_func_maploader File “/Users/shpilkin/Downloads/textAI/ooba_manual/text-generation-webui/modules/models.py”, line 290, in AutoGPTQ_loader import modules.AutoGPTQ_loader File “/Users/shpilkin/Downloads/textAI/ooba_manual/text-generation-webui/modules/AutoGPTQ_loader.py”, line 3, in from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig ModuleNotFoundError: No module named ‘auto_gptq’
I also used other tutorials but still have error. Do you have any ideas why is it happening? Maybe there already other tutorials available?