init
This commit is contained in:
@@ -0,0 +1,36 @@
|
||||
# Local Voice Assistant (Docker Compose)
|
||||
|
||||
This repository contains a minimal multi-container voice assistant composed of:
|
||||
|
||||
- `whisper` - FastAPI service exposing POST /transcribe for speech-to-text using Whisper.
|
||||
- `coquitts` - FastAPI service exposing POST /speak for text-to-speech using Coqui TTS.
|
||||
- `ollama` - Placeholder container running an Ollama-compatible LLM (exposed on port 11434).
|
||||
- `middleware` - FastAPI service exposing POST /chat that orchestrates the above services.
|
||||
|
||||
Quick notes & assumptions
|
||||
- These services are a starting point. Models will be downloaded on first run and may require lots of disk and memory.
|
||||
- `ollama` uses a placeholder public image; you must replace it with your own Ollama setup or run an Ollama server with the desired model.
|
||||
- The Whisper service uses the `whisper` Python package. For better performance consider `faster-whisper` or running Whisper in GPU-enabled base images.
|
||||
- The Coqui TTS service uses the `TTS` package and downloads German models on first run.
|
||||
|
||||
Run locally with Docker Compose
|
||||
|
||||
1. Build and start:
|
||||
|
||||
```bash
|
||||
docker-compose up --build
|
||||
```
|
||||
|
||||
2. Example request to the middleware:
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/chat" -F "file=@./sample.wav;type=audio/wav" --output response.wav
|
||||
```
|
||||
|
||||
The `response.wav` will contain the German TTS response.
|
||||
|
||||
Next steps / improvements
|
||||
- Add authentication between services.
|
||||
- Add healthchecks and readiness probes.
|
||||
- Add model selection, caching, and GPU support where available.
|
||||
- Replace Ollama placeholder with a validated model name and response parsing.
|
||||
Reference in New Issue
Block a user