Assessing models Method 3 Using qroq Fast AI Inference
This page is to learn the 3rd way of accessing LLM on OpenWEB UI using groq (note that is different from grok). Method 1 is here. Method 2 is here.
Method 1 is likely for LLMs that are smaller in size and your computer has adequate computing resources (e.g., GPU? Ram?) to handle the models.
Method 2 will be for calling on direct APIs of large models like Open AI models.
Method 3 is like Method 2 except we call on Groq API which is hosting the large models.
So WHY introduce Method 3 here if it is the same as Method 2?
- You get to use a lot of others models than just Open AI in Method 2.
- Groq has llama models from Facebook, mixtral from Mistral, and latest Deepseek 70b model.
- Groq has a free plan. Since it is free, tt comes with lower rate limits. That is, it might limit you to some number of calls in a day or something
- But people has said that Groq API is really fast!
- THIS IS THE BEST REASON WHY YOU MIGHT WANT TO USE GROQ API: YOU CAN ACCESS THE
llama3.2-vision:11b
VISION MODEL- see here
Setup
The setup is VERY EASY. See video Video 1 below.
Remember while it is free, do NOT share your API key with anyone and do note that there will be rate limits since it is free.