Assessing Text-To-Speech Using Kokoro-FastAPI

This page is to learn how to configure OpenWeb UI to TTS using Kokoro-Fast API. Follow the instructions from this video here Video 1

The installation will first set up another docker container aside from your OpenWebUI docker container. The next step is connect the TTS engine in your OpenWebUI to this new port.

Source:

What is Kokoro-FastAPI?

Kokoro-FastAPI is a dockerized FastAPI wrapper for the Kokoro-82M text-to-speech model that implements the OpenAI API endpoint specification. It offers high-performance text-to-speech with impressive generation speeds: - 100x+ real-time speed via HF A100 - 35-50x+ real-time speed via 4060Ti - 5x+ real-time speed via M3 Pro CPU

Key Features

  • OpenAI-compatible Speech endpoint with inline voice combination
  • NVIDIA GPU accelerated or CPU Onnx inference
  • Streaming support with variable chunking
  • Multiple audio format support (.mp3, .wav, .opus, .flac, .aac, .pcm)
  • Gradio Web UI interface for easy testing
  • Phoneme endpoints for conversion and generation

Voices

a = American and b = British; f = female and m = male

  • af
  • af_bella
  • af_nicole
  • af_sarah
  • af_sky
  • am_adam
  • am_michael
  • bf_emma
  • bf_isabella
  • bm_george
  • bm_lewis

Languages

  • en_us
  • en_uk

Requirements

  • Docker installed on your system
  • Open WebUI running
  • For GPU support: NVIDIA GPU with CUDA 12.1
  • For CPU-only: No special requirements

Setting up


Video 1