rocket_launch
Quickstart
LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs , generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families and architectures.
Installation Methods link LocalAI is available as a container image and binary, compatible with various container engines like Docker, Podman, and Kubernetes. Container images are published on quay.io and Docker Hub . Binaries can be downloaded from GitHub .
Hardware Requirements: The hardware requirements for LocalAI vary based on the model size and quantization method used. For performance benchmarks with different backends, such as llama.cpp
, visit this link . The rwkv
backend is noted for its lower resource consumption.
Prerequisites link Before you begin, ensure you have a container engine installed if you are not using the binaries. Suitable options include Docker or Podman. For installation instructions, refer to the following guides:
Running Models link
Do you have already a model file? Skip to Run models manually .
LocalAI allows one-click runs with popular models. It downloads the model and starts the API with the model loaded.
There are different categories of models: LLMs , Multimodal LLM , Embeddings , Audio to Text , and Text to Audio depending on the backend being used and the model architecture.
💡Don’t need GPU acceleration? use the CPU images which are lighter and do not have Nvidia dependencies
Model
Category
Docker command
phi-2
LLM
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core phi-2
🌋 llava
Multimodal LLM
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core llava
mistral-openorca
LLM
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core mistral-openorca
bert-cpp
Embeddings
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core bert-cpp
all-minilm-l6-v2
Embeddings
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg all-minilm-l6-v2
whisper-base
Audio to Text
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core whisper-base
rhasspy-voice-en-us-amy
Text to Audio
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core rhasspy-voice-en-us-amy
🐸 coqui
Text to Audio
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg coqui
🐶 bark
Text to Audio
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg bark
🔊 vall-e-x
Text to Audio
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg vall-e-x
mixtral-instruct Mixtral-8x7B-Instruct-v0.1
LLM
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core mixtral-instruct
tinyllama-chat original model
LLM
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core tinyllama-chat
dolphin-2.5-mixtral-8x7b
LLM
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core dolphin-2.5-mixtral-8x7b
🐍 mamba
LLM
GPU-only
animagine-xl
Text to Image
GPU-only
transformers-tinyllama
LLM
GPU-only
codellama-7b (with transformers)
LLM
GPU-only
codellama-7b-gguf (with llama.cpp)
LLM
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core codellama-7b-gguf
To know which version of CUDA do you have available, you can check with nvidia-smi
or nvcc --version
see also GPU acceleration .
Model
Category
Docker command
phi-2
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11-core phi-2
🌋 llava
Multimodal LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11-core llava
mistral-openorca
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11-core mistral-openorca
bert-cpp
Embeddings
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11-core bert-cpp
all-minilm-l6-v2
Embeddings
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11 all-minilm-l6-v2
whisper-base
Audio to Text
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11-core whisper-base
rhasspy-voice-en-us-amy
Text to Audio
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11-core rhasspy-voice-en-us-amy
🐸 coqui
Text to Audio
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11 coqui
🐶 bark
Text to Audio
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11 bark
🔊 vall-e-x
Text to Audio
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11 vall-e-x
mixtral-instruct Mixtral-8x7B-Instruct-v0.1
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11-core mixtral-instruct
tinyllama-chat original model
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11-core tinyllama-chat
dolphin-2.5-mixtral-8x7b
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11-core dolphin-2.5-mixtral-8x7b
🐍 mamba
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11 mamba-chat
animagine-xl
Text to Image
docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai:v2.7.0-cublas-cuda11 animagine-xl
transformers-tinyllama
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11 transformers-tinyllama
codellama-7b
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11 codellama-7b
codellama-7b-gguf
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda11-core codellama-7b-gguf
To know which version of CUDA do you have available, you can check with nvidia-smi
or nvcc --version
see also GPU acceleration .
Model
Category
Docker command
phi-2
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12-core phi-2
🌋 llava
Multimodal LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12-core llava
mistral-openorca
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12-core mistral-openorca
bert-cpp
Embeddings
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12-core bert-cpp
all-minilm-l6-v2
Embeddings
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12 all-minilm-l6-v2
whisper-base
Audio to Text
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12-core whisper-base
rhasspy-voice-en-us-amy
Text to Audio
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12-core rhasspy-voice-en-us-amy
🐸 coqui
Text to Audio
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12 coqui
🐶 bark
Text to Audio
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12 bark
🔊 vall-e-x
Text to Audio
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12 vall-e-x
mixtral-instruct Mixtral-8x7B-Instruct-v0.1
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12-core mixtral-instruct
tinyllama-chat original model
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12-core tinyllama-chat
dolphin-2.5-mixtral-8x7b
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12-core dolphin-2.5-mixtral-8x7b
🐍 mamba
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12 mamba-chat
animagine-xl
Text to Image
docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai:v2.7.0-cublas-cuda12 animagine-xl
transformers-tinyllama
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12 transformers-tinyllama
codellama-7b
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12 codellama-7b
codellama-7b-gguf
LLM
docker run -ti -p 8080:8080 --gpus all localai/localai:v2.7.0-cublas-cuda12-core codellama-7b-gguf
Tip You can actually specify multiple models to start an instance with the models loaded, for example to have both llava and phi-2 configured:
docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core llava phi-2
Container images link LocalAI provides a variety of images to support different environments. These images are available on quay.io and Docker Hub .
For GPU Acceleration support for Nvidia video graphic cards, use the Nvidia/CUDA images, if you don’t have a GPU, use the CPU images. If you have AMD or Mac Silicon, see the build section .
Available Images Types :
Images ending with -core
are smaller images without predownload python dependencies. Use these images if you plan to use llama.cpp
, stablediffusion-ncn
, tinydream
or rwkv
backends - if you are not sure which one to use, do not use these images.
FFMpeg is not included in the default images due to its licensing . If you need FFMpeg, use the images ending with -ffmpeg
. Note that ffmpeg
is needed in case of using audio-to-text
LocalAI’s features.
If using old and outdated CPUs and no GPUs you might need to set REBUILD
to true
as environment variable along with options to disable the flags which your CPU does not support, however note that inference will perform poorly and slow. See also flagset compatibility .
Description
Quay
Docker Hub
Latest images from the branch (development)
quay.io/go-skynet/local-ai:master
localai/localai:master
Latest tag
quay.io/go-skynet/local-ai:latest
localai/localai:latest
Versioned image
quay.io/go-skynet/local-ai:v2.7.0
localai/localai:v2.7.0
Versioned image including FFMpeg
quay.io/go-skynet/local-ai:v2.7.0-ffmpeg
localai/localai:v2.7.0-ffmpeg
Versioned image including FFMpeg, no python
quay.io/go-skynet/local-ai:v2.7.0-ffmpeg-core
localai/localai:v2.7.0-ffmpeg-core
Description
Quay
Docker Hub
Latest images from the branch (development)
quay.io/go-skynet/local-ai:master-cublas-cuda11
localai/localai:master-cublas-cuda11
Latest tag
quay.io/go-skynet/local-ai:latest-cublas-cuda11
localai/localai:latest-cublas-cuda11
Versioned image
quay.io/go-skynet/local-ai:v2.7.0-cublas-cuda11
localai/localai:v2.7.0-cublas-cuda11
Versioned image including FFMpeg
quay.io/go-skynet/local-ai:v2.7.0-cublas-cuda11-ffmpeg
localai/localai:v2.7.0-cublas-cuda11-ffmpeg
Versioned image including FFMpeg, no python
quay.io/go-skynet/local-ai:v2.7.0-cublas-cuda11-ffmpeg-core
localai/localai:v2.7.0-cublas-cuda11-ffmpeg-core
Description
Quay
Docker Hub
Latest images from the branch (development)
quay.io/go-skynet/local-ai:master-cublas-cuda12
localai/localai:master-cublas-cuda12
Latest tag
quay.io/go-skynet/local-ai:latest-cublas-cuda12
localai/localai:latest-cublas-cuda12
Versioned image
quay.io/go-skynet/local-ai:v2.7.0-cublas-cuda12
localai/localai:v2.7.0-cublas-cuda12
Versioned image including FFMpeg
quay.io/go-skynet/local-ai:v2.7.0-cublas-cuda12-ffmpeg
localai/localai:v2.7.0-cublas-cuda12-ffmpeg
Versioned image including FFMpeg, no python
quay.io/go-skynet/local-ai:v2.7.0-cublas-cuda12-ffmpeg-core
localai/localai:v2.7.0-cublas-cuda12-ffmpeg-core
What’s next? link Explore further resources and community contributions:
Last updated
26 Jan 2024, 17:55 +0100
. history