Vision models to read images
Do you want to do any of the following:
- visual recognition
- image reasoning and captioning, and
- answering general questions about an image?
There is no special setup to run LLM vision models. However, the key issue is which models do you want to run. See the list of vision models in Ollama here
For BIG models that you are unable to run locally such as llama3.2-vision:90b
, your solution is to use groq here
For smaller models, you might be able to download locally and use it … at a slower speed. Use this method here
Large vision models such as llama3.2-vision 11b or 90b
Smaller vision models
- Note that smaller models are still a few billion parameters, so it depends on your computer.
- Access these small models using Method 1 here
- Smaller models
moondream
1.8bllava-phi3
3.8b