LLaMA is a state-of-the-art large language model developed by Facebook’s parent company Meta. Similar to OpenAI’s GPT models, it is a language model trained to predict the next word of the input sentence. Different from GPT models, Meta released the model to the public. Anyone can study and develop applications on top of the model.
In this post, you will learn how to install and run the LLaMA model on Apple Silicon M1 or M2 MacOS.
- What is LLaMA?
- System Requirements
- Installing on Mac
- Running the model
- Update: Run Llama 2 model
What is LLaMA?
LLaMA (Large Language Model Meta AI) is Meta (Facebook)’s answer to GPT, the family of language models behind ChatGPT created by OpenAI. The pre-trained model is available in several sizes: 7B, 13B, 33B, and 65B parameters.
Smaller and better
Despite its smaller size, the LLaMA 13B model outperforms GPT-3 (175B parameters) on most benchmarks.
OpenAI’s GPT models and ChatGPT can only be accessed using a web service. Neither the model nor the training data are publicly available. This is in stark contrast with Meta’s LLaMA, for which both the model weight and the training data are available.
The small size and open model make LLaMA an ideal candidate for running the model locally on consumer-grade hardware.
Many people or companies are interested in fine-tuning the model because it is affordable to do on LLaMA models.
The table below shows the original model sizes and the 4-bit reduced size. You will need to load the whole 4-bit reduced size to the memory. So make sure you have enough RAM on your system to proceed.
|Model||Original size||4-bit Reduced Size|
|7B||13 GB||3.9 GB|
|13B||24 GB||7.8 GB|
|30B||60 GB||19.5 GB|
|65B||120 GB||38.5 GB|
The rest of the article will focus on installing the 7B model.
Installing on Mac
Step 1: Install Homebrew
Install Homebrew, a package manager for Mac, if you haven’t already.
Open the Terminal app, type the following command, and press return.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Step 2: Install the required packages
In Terminal, run the following command.
brew install cmake [email protected] git wget
Step 3: Clone llama.cpp
In Terminal, run the following command.
git clone https://github.com/ggerganov/llama.cpp
Go into the
cd llama.cpp; make
It should be quick. You should end up in the
llama.cpp folder when you are done.
Step 4: Download the 7B LLaMA model
Meta has released the model to the public. The official way to download the model is to request it through this Google form. You will be provided a download link once Meta approves you.
Since its release, the model has been leaked on Torrent.
The whole package contains the following folders and files
To use the 7B LLaMA model, you will need the following three.
Put them in the
models folder inside the
Step 5: Install Python dependence
Before you start, make sure you are running Python 3.10
You are good if you see
Run the following in
llama.cpp folder in Terminal to create a virtual environment.
python3 -m venv venv
A folder called
venv should be created.
Run the following command to install the necessary package.
./venv/bin/pip install torch numpy sentencepiece
Step 6: Convert and quantize model file
Convert the file to the
ggml format using the following command.
./venv/bin/python convert-pth-to-ggml.py models/7B/ 1
This is going to take a while. A file called
models/7B//ggml-model-f16.bin should be created.
Run the following command to quantize the model to 4-bit.
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
The 4-bit model
ggml-model-q4_0.bin should be created in the
The new file is 3.9G, significantly smaller than the original size of
Step 7: Test run
Run the following command.
./main -m ./models/7B/ggml-model-q4_0.bin -n 128
You should see some text generated if the installation is successful.
Remarks on installation
- The above instructions are for installing the 7B model. Apply the same steps to another available folder (13B, 30B, or 65B) to install a bigger model.
- You can delete the intermediate file
Running the model
You can use a convenient script to run the 7B model in a ChatGPT-like interactive mode.
Run the chat script.
It appears to be less wordy than ChatGPT, but it does the job and runs locally!
Update: Run Llama 2 model
Llama 2, the updated version of Llama 1, is released on July 2023. You can find model page and download link in this article.