Assessing Text-To-Speech Using Kokoro-FastAPI
This page is to learn how to configure OpenWeb UI to TTS using Kokoro-Fast API. Follow the instructions from this video here Video 1
The installation will first set up another docker container aside from your OpenWebUI docker container. The next step is connect the TTS engine in your OpenWebUI to this new port.
Source:
- Kokoro-FastAPI-integration documentation on OpenWebUI here
- Read this to better understand what is Kokoro
- BUT do NOT follow the set-up here.
- Try the BIGGER model on Huggingface here
What is Kokoro-FastAPI
?
Kokoro-FastAPI is a dockerized FastAPI wrapper for the Kokoro-82M text-to-speech model that implements the OpenAI API endpoint specification. It offers high-performance text-to-speech with impressive generation speeds: - 100x+ real-time speed via HF A100 - 35-50x+ real-time speed via 4060Ti - 5x+ real-time speed via M3 Pro CPU
Key Features
- OpenAI-compatible Speech endpoint with inline voice combination
- NVIDIA GPU accelerated or CPU Onnx inference
- Streaming support with variable chunking
- Multiple audio format support (.mp3, .wav, .opus, .flac, .aac, .pcm)
- Gradio Web UI interface for easy testing
- Phoneme endpoints for conversion and generation
Voices
a = American and b = British; f = female and m = male
- af
- af_bella
- af_nicole
- af_sarah
- af_sky
- am_adam
- am_michael
- bf_emma
- bf_isabella
- bm_george
- bm_lewis
Languages
- en_us
- en_uk
Requirements
- Docker installed on your system
- Open WebUI running
- For GPU support: NVIDIA GPU with CUDA 12.1
- For CPU-only: No special requirements
Setting up