28361c70fc04ed2de08e4c34273a232e7d55bdfb
Change the default port for uvicorn from 8000 to 8003.
Local Voice Assistant (Docker Compose)
This repository contains a minimal multi-container voice assistant composed of:
whisper- FastAPI service exposing POST /transcribe for speech-to-text using Whisper.coquitts- FastAPI service exposing POST /speak for text-to-speech using Coqui TTS.ollama- Placeholder container running an Ollama-compatible LLM (exposed on port 11434).middleware- FastAPI service exposing POST /chat that orchestrates the above services.
Quick notes & assumptions
- These services are a starting point. Models will be downloaded on first run and may require lots of disk and memory.
ollamauses a placeholder public image; you must replace it with your own Ollama setup or run an Ollama server with the desired model.- The Whisper service uses the
whisperPython package. For better performance considerfaster-whisperor running Whisper in GPU-enabled base images. - The Coqui TTS service uses the
TTSpackage and downloads German models on first run.
Run locally with Docker Compose
- Build and start:
docker-compose up --build
- Example request to the middleware:
curl -X POST "http://localhost:8000/chat" -F "file=@./sample.wav;type=audio/wav" --output response.wav
The response.wav will contain the German TTS response.
Next steps / improvements
- Add authentication between services.
- Add healthchecks and readiness probes.
- Add model selection, caching, and GPU support where available.
- Replace Ollama placeholder with a validated model name and response parsing.
Description
Languages
Python
41.9%
JavaScript
30.7%
CSS
22.7%
Dockerfile
4.7%