Vision models to read images

Do you want to do any of the following:

visual recognition
image reasoning and captioning, and
answering general questions about an image?

There is no special setup to run LLM vision models. However, the key issue is which models do you want to run. See the list of vision models in Ollama here

For BIG models that you are unable to run locally such as llama3.2-vision:90b, your solution is to use groq here

For smaller models, you might be able to download locally and use it … at a slower speed. Use this method here

Large vision models such as llama3.2-vision 11b or 90b

Model cards here
These are huge models.
But maybe for llama3.2-vision:11b, you might just be able to download and run it locally.
- see “Accessing models Step 1” here
Your alternative solution is to use groq here.
Some other vision models
- llava llava-7b

Smaller vision models

Note that smaller models are still a few billion parameters, so it depends on your computer.
Access these small models using Method 1 here
Smaller models
- moondream 1.8b
- llava-phi3 3.8b