Skip to content

How to install LLaMA on Mac (llama.cpp)

LLaMA is a state-of-the-art large language model developed by Facebook’s parent company Meta. Similar to OpenAI’s GPT models, it is a language model trained to predict the next word of the input sentence. Different from GPT models, Meta released the model to the public. Anyone can study and develop applications on top of the model.

In this post, you will learn how to install and run the LLaMA model on Apple Silicon M1 or M2 MacOS.

What is LLaMA?

LLaMA (Large Language Model Meta AI) is Meta (Facebook)’s answer to GPT, the family of language models behind ChatGPT created by OpenAI. The pre-trained model is available in several sizes: 7B, 13B, 33B, and 65B parameters.

Smaller and better

Despite its smaller size, the LLaMA 13B model outperforms GPT-3 (175B parameters) on most benchmarks.

Open source

OpenAI’s GPT models and ChatGPT can only be accessed using a web service. Neither the model nor the training data are publicly available. This is in stark contrast with Meta’s LLaMA, for which both the model weight and the training data are available.

Use

The small size and open model make LLaMA an ideal candidate for running the model locally on consumer-grade hardware.

Many people or companies are interested in fine-tuning the model because it is affordable to do on LLaMA models.

System Requirements

The table below shows the original model sizes and the 4-bit reduced size. You will need to load the whole 4-bit reduced size to the memory. So make sure you have enough RAM on your system to proceed.

ModelOriginal size4-bit Reduced Size
7B13 GB3.9 GB
13B24 GB7.8 GB
30B60 GB19.5 GB
65B120 GB38.5 GB
Model sizes.

The rest of the article will focus on installing the 7B model.

Installing on Mac

Step 1: Install Homebrew

Install Homebrew, a package manager for Mac, if you haven’t already.

Open the Terminal app, type the following command, and press return.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Step 2: Install the required packages

In Terminal, run the following command.

brew install cmake [email protected] git wget

Step 3: Clone llama.cpp

In Terminal, run the following command.

git clone https://github.com/ggerganov/llama.cpp

Go into the llama.cpp folder

cd llama.cpp; make

It should be quick. You should end up in the llama.cpp folder when you are done.

Step 4: Download the 7B LLaMA model

Meta has released the model to the public. The official way to download the model is to request it through this Google form. You will be provided a download link once Meta approves you.

Since its release, the model has been leaked on Torrent.

The whole package contains the following folders and files

  • 7B (folder)
  • 13B (folder)
  • 30B (folder)
  • 65B (folder)
  • tokenizer_checklist.chk
  • tokenizer.model

To use the 7B LLaMA model, you will need the following three.

  • 7B (folder)
  • tokenizer_checklist.chk
  • tokenizer.model

Put them in the models folder inside the llama.cpp folder.

Step 5: Install Python dependence

Before you start, make sure you are running Python 3.10

python3 --version

You are good if you see Python 3.10.x.

Run the following in llama.cpp folder in Terminal to create a virtual environment.

python3 -m venv venv

A folder called venv should be created.

Run the following command to install the necessary package.

./venv/bin/pip install torch numpy sentencepiece

Step 6: Convert and quantize model file

Convert the file to the ggml format using the following command.

./venv/bin/python convert-pth-to-ggml.py models/7B/ 1

This is going to take a while. A file called models/7B//ggml-model-f16.bin should be created.

Run the following command to quantize the model to 4-bit.

./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

The 4-bit model ggml-model-q4_0.bin should be created in the 7B folder.

The new file is 3.9G, significantly smaller than the original size of 13G.

Step 7: Test run

Run the following command.

./main -m ./models/7B/ggml-model-q4_0.bin -n 128

You should see some text generated if the installation is successful.

Remarks on installation

  • The above instructions are for installing the 7B model. Apply the same steps to another available folder (13B, 30B, or 65B) to install a bigger model.
  • You can delete the intermediate file ggml-model-f16.bin.

Running the model

You can use a convenient script to run the 7B model in a ChatGPT-like interactive mode.

Run the chat script.

./examples/chat.sh
Llama model.

It appears to be less wordy than ChatGPT, but it does the job and runs locally!

Update: Run Llama 2 model

Llama 2, the updated version of Llama 1, is released on July 2023. You can find model page and download link in this article.

12 thoughts on “How to install LLaMA on Mac (llama.cpp)”

  1. At the moment the models are downloaded as consolidated.00.pth, consolidated.01.pth.. etc but there isn’t anymore the script convert-pth-to-ggml.py in the project llama.cpp

    where to find the script or download directly the ggml model?
    Thank you

  2. For me but it appears warnings when make is performed

    warning: unknown warning option ‘-Wno-format-truncation’ [-Wunknown-warning-option]
    common/common.h:24:68: warning: token pasting of ‘,’ and VA_ARGS is a GNU extension [-Wgnu-zero-variadic-macro-arguments]

    Could you help me to solved these issues?
    MacBook Pro M1 Max Ventura 13.4.1

    1. Hello,
      Thks to you for this tutorial.
      I use Macbook pro silicon M1 and, this seems not be runnable for me why?
      For instance the Step 6 doesn’t appears be runnable and there is no explaination.
      Could you help me ?

  3. Seems that web links are not allowed… That’s too bad. So, again…
    Thanks for the instructions. I also followed the video version from YouTube titled “Tutorial: Install a Chat Large Language Model (LLM) on your M1/M2 Mac” – I also wrote the results in a blog on DNA dot Today.
    Today (Aug 4) I installed the 65B version, but it’s very slow, and only 4 words have been written since the final command (those that are not the same every time.) Next time I’ll try on a Mac M1Pro.

Leave a Reply

Your email address will not be published. Required fields are marked *