Assessing Text-To-Speech Using Kokoro-FastAPI

This page is to learn how to configure OpenWeb UI to TTS using Kokoro-Fast API. Follow the instructions from this video here Video 1

The installation will first set up another docker container aside from your OpenWebUI docker container. The next step is connect the TTS engine in your OpenWebUI to this new port.

Source:

Kokoro-FastAPI-integration documentation on OpenWebUI here
- Read this to better understand what is Kokoro
- BUT do NOT follow the set-up here.
Try the BIGGER model on Huggingface here

What is `Kokoro-FastAPI`?

Kokoro-FastAPI is a dockerized FastAPI wrapper for the Kokoro-82M text-to-speech model that implements the OpenAI API endpoint specification. It offers high-performance text-to-speech with impressive generation speeds: - 100x+ real-time speed via HF A100 - 35-50x+ real-time speed via 4060Ti - 5x+ real-time speed via M3 Pro CPU

Key Features

OpenAI-compatible Speech endpoint with inline voice combination
NVIDIA GPU accelerated or CPU Onnx inference
Streaming support with variable chunking
Multiple audio format support (.mp3, .wav, .opus, .flac, .aac, .pcm)
Gradio Web UI interface for easy testing
Phoneme endpoints for conversion and generation

Voices

a = American and b = British; f = female and m = male

af
af_bella
af_nicole
af_sarah
af_sky
am_adam
am_michael
bf_emma
bf_isabella
bm_george
bm_lewis

Languages

en_us
en_uk

Requirements

Docker installed on your system
Open WebUI running
For GPU support: NVIDIA GPU with CUDA 12.1
For CPU-only: No special requirements

Setting up

Follow Video 1
It uses the commands from this url.

Video 1

What is Kokoro-FastAPI?