Do you know you can have a conversation with the LLaMA AI chatbot like talking to a real person? In this post, you will learn how to set up talk-llama, an interface that lets you talk to a llama model of your choice.
Contents
What is talk-llama?
talk-llama is part of whisper.cpp, a high-performance interface to OpenAI’s Whisper model.
Whisper is an open-source speech recognition AI model. It performs state-of-the-art speech-to-text transcription. The model supports multiple languages.
Installing talk-llama
Step 1: Install llama.cpp
Follow this article to install llama.cpp. We will need the 4-bit quantized 7B model.
You should see the file llama.cpp > models > 7B > ggml-model-q4_0.bin
before you proceed to the next step.
Step 2: Install whsiper.cpp
First, clone the whisper.cpp repository.
You should clone whsiper.cpp in the same directory as llama.cpp.
Open the Terminal App and run the command.
git clone https://github.com/ggerganov/whisper.cpp.git
After this command, you should have the following two folders in the same folder.
llama.cpp
(folder)whisper.cpp
(folder)
Download the Whisper model in the English language by running the following command. This model is converted to a special ggml
format.
cd whisper.cpp; bash ./models/download-ggml-model.sh base.en
Build the main file.
make
Check if everything is working so far. Test the Whisper model by running the following command.
./main -f samples/jfk.wav
You should see part of JFK’s speech transcribed.

Step 3: Install talk-llama
Now, install the SDL2 (Simple DirectMedia Layer) library. It is used for capturing audio from your microphone.
brew install sdl2
Make the talk-llama executable.
make talk-llama
That’s it!
Using the voice chat
Start the llama-talk voice chat by running the following command. It uses the English Whisper model and the 7B LLaMA model.
./talk-llama -mw ./models/ggml-base.en.bin -ml ../llama.cpp/models/7B/ggml-model-q4_0.bin -p "Georgi" -t 8
Grant permission to use the microphone.
Now you can start your voice conversation with the LLaMA model!

Press Ctrl-C or close the terminal when you are done.
I tested on an Apple M1 with 16 GB. It’s pretty slow. It is a lot slower than using text-based chat.
Multilingual support
The Whisper model can understand 99 different languages. You can configure the talk-llama to use a multilingual mode. You can speak any language to it, and it will transcribe your speech to English.
Inside the whisper.cpp
folder, run
bash ./models/download-ggml-model.sh medium
This will download the medium
-size multilingual model.
Run talk-llama with this multilingual model.
./talk-llama -mw ./models/ggml-medium.bin -ml ../llama.cpp/models/7B/ggml-model-q4_0.bin -p "Georgi" -t 8
Now start talking to LLaMA in a foreign language!
Customize her voice
The program uses MacOS’s say
command by default. You can switch to using any command line text-to-speech program.
Edit the file examples/talk-llama/speak.sh
. Change the line
say "$2"
To a calling a program of your choice to customize LLaMA’s voice.

Hello. Do we have an update using Llama2?
windows呢
Hi, you can run llama.cpp on Windows but many prefer text-generation-webui because it can use GPU.
https://agi-sphere.com/text-generation-webui-windows/
talk-llama requires llama.cpp.