LLaMA is a state-of-the-art large language model developed by Facebook’s parent company Meta. Similar to OpenAI’s GPT models, it is a language model trained to predict the next word of the input sentence. Different from GPT models, Meta released the model to the public. Anyone can study and develop applications on top of the model.
In this post, you will learn how to install and run the LLaMA model on Apple Silicon M1 or M2 MacOS.
Contents
What is LLaMA?
LLaMA (Large Language Model Meta AI) is Meta (Facebook)’s answer to GPT, the family of language models behind ChatGPT created by OpenAI. The pre-trained model is available in several sizes: 7B, 13B, 33B, and 65B parameters.
Smaller and better
Despite its smaller size, the LLaMA 13B model outperforms GPT-3 (175B parameters) on most benchmarks.
Open source
OpenAI’s GPT models and ChatGPT can only be accessed using a web service. Neither the model nor the training data are publicly available. This is in stark contrast with Meta’s LLaMA, for which both the model weight and the training data are available.
Use
The small size and open model make LLaMA an ideal candidate for running the model locally on consumer-grade hardware.
Many people or companies are interested in fine-tuning the model because it is affordable to do on LLaMA models.
System Requirements
The table below shows the original model sizes and the 4-bit reduced size. You will need to load the whole 4-bit reduced size to the memory. So make sure you have enough RAM on your system to proceed.
Model | Original size | 4-bit Reduced Size |
---|---|---|
7B | 13 GB | 3.9 GB |
13B | 24 GB | 7.8 GB |
30B | 60 GB | 19.5 GB |
65B | 120 GB | 38.5 GB |
The rest of the article will focus on installing the 7B model.
Installing on Mac
Step 1: Install Homebrew
Install Homebrew, a package manager for Mac, if you haven’t already.
Open the Terminal app, type the following command, and press return.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Step 2: Install the required packages
In Terminal, run the following command.
brew install cmake [email protected] git wget
Step 3: Clone llama.cpp
In Terminal, run the following command.
git clone https://github.com/ggerganov/llama.cpp
Go into the llama.cpp
folder
cd llama.cpp; make
It should be quick. You should end up in the llama.cpp
folder when you are done.
Step 4: Download the 7B LLaMA model
Meta has released the model to the public. The official way to download the model is to request it through this Google form. You will be provided a download link once Meta approves you.
Since its release, the model has been leaked on Torrent.
The whole package contains the following folders and files
7B
(folder)13B
(folder)30B
(folder)65B
(folder)tokenizer_checklist.chk
tokenizer.model
To use the 7B LLaMA model, you will need the following three.
7B
(folder)tokenizer_checklist.chk
tokenizer.model
Put them in the models
folder inside the llama.cpp
folder.
Step 5: Install Python dependence
Before you start, make sure you are running Python 3.10
python3 --version
You are good if you see Python 3.10.x
.
Run the following in llama.cpp
folder in Terminal to create a virtual environment.
python3 -m venv venv
A folder called venv
should be created.
Run the following command to install the necessary package.
./venv/bin/pip install torch numpy sentencepiece
Step 6: Convert and quantize model file
Convert the file to the ggml
format using the following command.
./venv/bin/python convert-pth-to-ggml.py models/7B/ 1
This is going to take a while. A file called models/7B//ggml-model-f16.bin
should be created.
Run the following command to quantize the model to 4-bit.
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
The 4-bit model ggml-model-q4_0.bin
should be created in the 7B
folder.
The new file is 3.9G, significantly smaller than the original size of 13G
.
Step 7: Test run
Run the following command.
./main -m ./models/7B/ggml-model-q4_0.bin -n 128
You should see some text generated if the installation is successful.

Remarks on installation
- The above instructions are for installing the 7B model. Apply the same steps to another available folder (13B, 30B, or 65B) to install a bigger model.
- You can delete the intermediate file
ggml-model-f16.bin
.
Running the model
You can use a convenient script to run the 7B model in a ChatGPT-like interactive mode.
Run the chat script.
./examples/chat.sh

It appears to be less wordy than ChatGPT, but it does the job and runs locally!
Update: Run Llama 2 model
Llama 2, the updated version of Llama 1, is released on July 2023. You can find model page and download link in this article.
At the moment the models are downloaded as consolidated.00.pth, consolidated.01.pth.. etc but there isn’t anymore the script convert-pth-to-ggml.py in the project llama.cpp
where to find the script or download directly the ggml model?
Thank you
Could you please delete this previous remark this is not true regards.
And thés for the work done
You installed cmake and run make 🙂
For me but it appears warnings when make is performed
warning: unknown warning option ‘-Wno-format-truncation’ [-Wunknown-warning-option]
common/common.h:24:68: warning: token pasting of ‘,’ and VA_ARGS is a GNU extension [-Wgnu-zero-variadic-macro-arguments]
Could you help me to solved these issues?
MacBook Pro M1 Max Ventura 13.4.1
Hello,
Thks to you for this tutorial.
I use Macbook pro silicon M1 and, this seems not be runnable for me why?
For instance the Step 6 doesn’t appears be runnable and there is no explaination.
Could you help me ?
This is a bit outdated. Now the ggml model is widely available for download. You should get one and run it instead of converting it yourself.
I will update the guide.
Seems that web links are not allowed… That’s too bad. So, again…
Thanks for the instructions. I also followed the video version from YouTube titled “Tutorial: Install a Chat Large Language Model (LLM) on your M1/M2 Mac” – I also wrote the results in a blog on DNA dot Today.
Today (Aug 4) I installed the 65B version, but it’s very slow, and only 4 words have been written since the final command (those that are not the same every time.) Next time I’ll try on a Mac M1Pro.
thks
Oh dear, what an awful definition of Quantum Chemistry. See https://en.wikipedia.org/wiki/Quantum_chemistry
Yes, it works on intel mac too.
Wow… finally a nice guide that makes it work!
could it run on an intel mac?