Assessing models Method 3 Using qroq Fast AI Inference

This page is to learn the 3rd way of accessing LLM on OpenWEB UI using groq (note that is different from grok). Method 1 is here. Method 2 is here.

Method 1 is likely for LLMs that are smaller in size and your computer has adequate computing resources (e.g., GPU? Ram?) to handle the models.
Method 2 will be for calling on direct APIs of large models like Open AI models.

Method 3 is like Method 2 except we call on Groq API which is hosting the large models.

So WHY introduce Method 3 here if it is the same as Method 2?

You get to use a lot of others models than just Open AI in Method 2.
Groq has llama models from Facebook, mixtral from Mistral, and latest Deepseek 70b model.
Groq has a free plan. Since it is free, tt comes with lower rate limits. That is, it might limit you to some number of calls in a day or something
But people has said that Groq API is really fast!
THIS IS THE BEST REASON WHY YOU MIGHT WANT TO USE GROQ API: YOU CAN ACCESS THE llama3.2-vision:11b VISION MODEL
- see here

Setup

The setup is VERY EASY. See video Video 1 below.

Remember while it is free, do NOT share your API key with anyone and do note that there will be rate limits since it is free.

If you prefer to watch a video to understand how to download and use a model, check this Youtube below:

Video 1