Skip to content

How to run large language models locally on Windows (text-generation-webui)

install text-generation-webui on windows

It’s never been easier to run large language models locally on your PC. Oobabooga text-generation-webui is a free GUI for running language models on Windows, Mac, and Linux. It offers many convenient features, such as managing multiple models and a variety of interaction modes.

In this article, you will learn what text-generation-webui is and how to install it on Windows.

(See this guide for installing on Mac.)

What is text-generation-webui?

OObabooga Text-generation-webui is a GUI (graphical user interface) for running Large Language Models (LLMs) like LLaMA, GPT-J, Pythia, OPT, and GALACTICA.

Why do you need a GUI for LLMs? The GUI is like a middleman, in a good sense, who makes using the models a more pleasant experience. Instead of interacting with the language models in a terminal, you can switch models, save/load prompts with mouse clicks, and write prompts in a text box.

To sum up, it offers a better user experience for using local LLMs.

The idea is similar to AUTOMATIC111 web UI for the Stable Diffusion AI image generator.

Systems requirement

Your Windows PC should have an Nvidia GPU video card. You can still run the model without a GPU card.

Install text-generation-webui on Windows

We will go through two options:

  1. One-click installer
  2. Installing with command lines.

The one-click installer is recommended for regular users. Installing with command lines is for those who want to modify and contribute to the code base.

Installing text-generation-webui with One-click installer

The easiest way is to use the 1-click installer provided by the repository to install the webui.

Step 1: Install Visual Studio 2019 build tool

Download and install Visual Studio 2019 Build Tools. (You can use the first one in the table).

Step 2: Download the installer

Download the Windows installer. It’s a zip file.

Step 3: Unzip the Installer

In File Explorer, extract the content by right-clicking the zip file and then “Extract All…”.

Click Extract. You should get a new folder called oobabooga_windows.

This is your webui’s folder. You can move this folder to a location you like.

Step 4: Run the installer

Inside the oobabooga_windows folder, you will find several .bat files.

Double-click the file start_windows.bat to start the installation. A warning may show up.

Click More info and then Run anyway.

Step 5: Answer some questions

Next, it will ask you what GPU do you have.

For Windows users, it will either be A) NVIDIA or D) None. Pick NVIDIA if you have a discrete Nvidia GPU like the RT or RTX series. (Integrated GPU doesn’t count). Select None if you don’t have a discrete GPU.

The installation will take a while. When it is done, you will see a local URL.

Step 6: Access the web-UI

Going to the local URL in a browser to access the web-UI.

http://127.0.0.1:7860

Ctrl + left click on the URL in the terminal should bring you there.

Step 7: Download a model

NVIDIA GPU users

If you have an NVIDIA GPU, download a model in GPTQ format.

In text-generation-webui, navigate to the Model page. In the Download custom model or LoRA section, put in one of the following.

localmodels/Llama-2-7B-Chat-GPTQ

CPU users

If you don’t have an NVIDIA GPU, you will need to download a model in GGML format. You can download the LLAMA 2 7B Chat model to get started. Put it in text-generation-webui > models folder.

Step 8: Load the model

Click the refresh button next to the Model dropdown menu. Select the model you just downloaded. Click the Load button to load.

Step 9: Test the model

Go to the Text generation page, and type a question in the input text box to start chatting.

Starting the web-ui again

To start the webui again next time, double-click the file start_windows.bat.

Installation using command lines

If the one-click installer doesn’t work for you or you are not comfortable running the script, follow these instructions to install text-generation-webui.

Make sure you are comfortable with using command lines before proceeding.

This option requires Python 3.10 and git. Install them if you don’t have them already.

The following instructions are for Nvidia GPU cards.

Step 1. Open the PowerShell app and clone the text-generation-webui repository.

git clone https://github.com/oobabooga/text-generation-webui.git

Step 2. Create a virtual environment.

cd text-generation-webui
python -m venv venv

Step 2. Enable running scripts in PowerShell.

set-executionpolicy RemoteSigned -Scope CurrentUser

Step 3. Activate the virtual environment.

venv\Scripts\Activate.ps1

You should see the label venv in front of the command prompt.

Step 4. Install pytorch.

pip install torchaudio torch==2.0.0+cu118 torchvision ninja --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu118

Step 5. Install the Windows version of bitsandbytes.

pip install https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.38.1-py3-none-any.whl

Step 5. Install the required packages.

pip install -r requirements.txt

Step 5. Install GPTQ for LLaMA.

Make the repositories directory.

mkdir repositories; cd repositories

Clone and install GPT-for-LLaMa.

git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
cd  GPTQ-for-LLaMa
python -m pip install -r requirements.txt
python setup_cuda.py install

Step 6. Download a model.

First, go back to text-generation-webui directory.

cd ~/text-generation-webui

Download a model. The following command downloads the Vicuna 7B model from this repository.

 python download-model.py eachadea/vicuna-7b-1.1

Step 7. Start text-generation-webui

python server.py --chat

Follow the local URL to start using text-generation-webui.

To make the startup easier next time, use a text editor to create a new text file start.bat with the following content.

call .\venv\Scripts\activate.bat
call python server.py --chat

Save it to text-generation-webui’s folder.

Next time, double-click start.bat to start the webui.

6 thoughts on “How to run large language models locally on Windows (text-generation-webui)”

  1. (How to adjust these parameters optimally?
    Apply flags/extensions and restart
    Toggle šŸ’”
    Save UI defaults to settings.yaml
    Available extensions
    Note that some of these extensions may require manually installing Python requirements through the command: pip install -r extensions/extension_name/requirements.txt
    api
    character_bias
    elevenlabs_tts
    example
    gallery
    google_translate
    long_replies
    multimodal
    ngrok
    openai
    perplexity_colors
    sd_api_pictures
    send_pictures
    silero_tts
    superbooga
    superboogav2
    Training_PRO
    whisper_stt
    Boolean command-line flags
    api
    auto_launch
    chat_buttons
    deepspeed
    listen
    model_menu
    monkey_patch
    multi_user
    no_cache
    no_stream
    public_api
    rwkv_cuda_on
    sdp_attention
    share
    verbose
    xformers
    Install or update an extension
    Enter the GitHub URL below and press Enter. For a list of extensions, see: https://github.com/oobabooga/text-generation-webui-extensions āš ļø WARNING āš ļø : extensions can execute arbitrary code. Make sure to inspect their source code before activating them.)

  2. Show me how to adjust the parameters and how to use text-generation-webui effectively. Besides the chat tool, are there any other applications?

  3. Thx for the guide.

    I launch with these options
    python server.py –auto-devices –gpu-memory 8

    But I always get memory error when I try to use Anything but ExLlama or ExLlama_HF
    Do you know how to fix? I have 32GB ram of which 20 is free when I hit load.

    self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=False)

    File “G:\text-generation-webui\venv\lib\site-packages\torch\nn\modules\linear.py”, line 96, in init
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
    RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 88604672 bytes.

  4. The install script was going fine until it hit AttributeError: module ā€˜os’ has no attribute ā€˜statvfs. Bing Chat and Bard both tell me the same thing below. However, I definitely downloaded the windows zip file. ANy suggestions?

    The error message you are seeing, ā€œAttributeError: module ā€˜os’ has no attribute ā€˜statvfsā€™ā€, is caused by the fact that the os.statvfs function is not available on Windows1. This function is used to get information about the file system, but it is only available on Unix-like systems1. It seems that the OObabooga Text-generation-webui script you are trying to install is attempting to use this function, which is causing the error.

Leave a Reply

Your email address will not be published. Required fields are marked *