On this page

Run other Models

Running other models

Do you have already a model file? Skip to Run models manually.

To load models into LocalAI, you can either use models manually or configure LocalAI to pull the models from external sources, like Huggingface and configure the model.

To do that, you can point LocalAI to an URL to a YAML configuration file - however - LocalAI does also have some popular model configuration embedded in the binary as well. Below you can find a list of the models configuration that LocalAI has pre-built, see Model customization on how to configure models from URLs.

There are different categories of models: LLMs, Multimodal LLM , Embeddings, Audio to Text, and Text to Audio depending on the backend being used and the model architecture.

💡

To customize the models, see Model customization. For more model configurations, visit the Examples Section and the configurations for the models below is available here.

💡Don’t need GPU acceleration? use the CPU images which are lighter and do not have Nvidia dependencies

Model	Category	Docker command
phi-2	LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core phi-2`
🌋 bakllava	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core bakllava`
🌋 llava-1.5	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core llava-1.5`
🌋 llava-1.6-mistral	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core llava-1.6-mistral`
🌋 llava-1.6-vicuna	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core llava-1.6-vicuna`
mistral-openorca	LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core mistral-openorca`
bert-cpp	Embeddings	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core bert-cpp`
all-minilm-l6-v2	Embeddings	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg all-minilm-l6-v2`
whisper-base	Audio to Text	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core whisper-base`
rhasspy-voice-en-us-amy	Text to Audio	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core rhasspy-voice-en-us-amy`
🐸 coqui	Text to Audio	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg coqui`
🐶 bark	Text to Audio	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg bark`
🔊 vall-e-x	Text to Audio	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg vall-e-x`
mixtral-instruct Mixtral-8x7B-Instruct-v0.1	LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core mixtral-instruct`
tinyllama-chat original model	LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core tinyllama-chat`
dolphin-2.5-mixtral-8x7b	LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core dolphin-2.5-mixtral-8x7b`
🐍 mamba	LLM	GPU-only
animagine-xl	Text to Image	GPU-only
transformers-tinyllama	LLM	GPU-only
codellama-7b (with transformers)	LLM	GPU-only
codellama-7b-gguf (with llama.cpp)	LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core codellama-7b-gguf`
hermes-2-pro-mistral	LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core hermes-2-pro-mistral`

To know which version of CUDA do you have available, you can check with nvidia-smi or nvcc --version see also GPU acceleration.

Model	Category	Docker command
phi-2	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core phi-2`
🌋 bakllava	Multimodal LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core bakllava`
🌋 llava-1.5	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda11-core llava-1.5`
🌋 llava-1.6-mistral	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda11-core llava-1.6-mistral`
🌋 llava-1.6-vicuna	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda11-core llava-1.6-vicuna`
mistral-openorca	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core mistral-openorca`
bert-cpp	Embeddings	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core bert-cpp`
all-minilm-l6-v2	Embeddings	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 all-minilm-l6-v2`
whisper-base	Audio to Text	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core whisper-base`
rhasspy-voice-en-us-amy	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core rhasspy-voice-en-us-amy`
🐸 coqui	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 coqui`
🐶 bark	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 bark`
🔊 vall-e-x	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 vall-e-x`
mixtral-instruct Mixtral-8x7B-Instruct-v0.1	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core mixtral-instruct`
tinyllama-chat original model	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core tinyllama-chat`
dolphin-2.5-mixtral-8x7b	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core dolphin-2.5-mixtral-8x7b`
🐍 mamba	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 mamba-chat`
animagine-xl	Text to Image	`docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai:v2.23.0-cublas-cuda11 animagine-xl`
transformers-tinyllama	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 transformers-tinyllama`
codellama-7b	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 codellama-7b`
codellama-7b-gguf	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core codellama-7b-gguf`
hermes-2-pro-mistral	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core hermes-2-pro-mistral`

To know which version of CUDA do you have available, you can check with nvidia-smi or nvcc --version see also GPU acceleration.

Model	Category	Docker command
phi-2	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core phi-2`
🌋 bakllava	Multimodal LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core bakllava`
🌋 llava-1.5	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda12-core llava-1.5`
🌋 llava-1.6-mistral	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda12-core llava-1.6-mistral`
🌋 llava-1.6-vicuna	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda12-core llava-1.6-vicuna`
mistral-openorca	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core mistral-openorca`
bert-cpp	Embeddings	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core bert-cpp`
all-minilm-l6-v2	Embeddings	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 all-minilm-l6-v2`
whisper-base	Audio to Text	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core whisper-base`
rhasspy-voice-en-us-amy	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core rhasspy-voice-en-us-amy`
🐸 coqui	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 coqui`
🐶 bark	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 bark`
🔊 vall-e-x	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 vall-e-x`
mixtral-instruct Mixtral-8x7B-Instruct-v0.1	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core mixtral-instruct`
tinyllama-chat original model	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core tinyllama-chat`
dolphin-2.5-mixtral-8x7b	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core dolphin-2.5-mixtral-8x7b`
🐍 mamba	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 mamba-chat`
animagine-xl	Text to Image	`docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai:v2.23.0-cublas-cuda12 animagine-xl`
transformers-tinyllama	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 transformers-tinyllama`
codellama-7b	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 codellama-7b`
codellama-7b-gguf	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core codellama-7b-gguf`
hermes-2-pro-mistral	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core hermes-2-pro-mistral`

💡

Tip You can actually specify multiple models to start an instance with the models loaded, for example to have both llava and phi-2 configured:

  docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core llava phi-2

Edit this page

Last updated 21 Nov 2024, 01:01 +0100 . history

Fine-tuning LLMs for text generation

FAQ

Star us on GitHub !

Run other Models

Running other models link

Running other models