On this page
article
🆕 GPT Vision
LocalAI supports understanding images by using LLaVA, and implements the GPT Vision API from OpenAI.
Usage
OpenAI docs: https://platform.openai.com/docs/guides/vision
To let LocalAI understand and reply with what sees in the image, use the /v1/chat/completions
endpoint, for example with curl:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llava",
"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
Setup
To setup the LLaVa models, follow the full example in the configuration examples.
Last updated 21 Jan 2024, 10:07 +0100 .