kratos-jarvis
GitHub: stevenvo780/kratos-jarvis
What it is
Sección titulada «What it is»kratos-jarvis is a self-hosted, voice-driven autonomous AI assistant built around the OpenClaw agent gateway. It runs entirely on local hardware — no cloud APIs required — and integrates voice input, an LLM relevance gate, screen vision, biometric/location awareness, and self-healing reliability mechanisms into a single cohesive system.
It is published as a reference implementation of one person’s real home setup, not a turnkey installer. Paths, containers, and environment variables are parameterised; no tokens or secrets are committed.
| Subsystem | Location |
|---|---|
| Hands-free voice stack (STT, TTS, VAD, screen vision, dashboard) | voice/ |
| Consigliere advisor + executor hooks + learning loop | autonomy/ |
| Location + health telemetry ingestion (geofencing, persistence) | telemetry/ |
| Watchdogs, heartbeats, context-collapse guard, stress tests | reliability/ |
| Per-subsystem deep writeups | docs/ |
Voice subsystem
Sección titulada «Voice subsystem»The continuous-listen engine (voice/claw-listen-daemon.py) runs as a background process and handles the full voice pipeline:
- Energy VAD with adaptive noise floor: computes RMS energy per audio frame; applies onset debounce to avoid false triggers from breath or ambient noise; arms the silence watchdog only after a configurable minimum speech duration.
- Local Whisper STT: transcribes with
faster-whisper-large-v3-turbovia a local OpenAI-compatible audio server running on GPU. - Relevance gate: a small local LLM (via Ollama) decides whether the transcription is addressed to Jarvis or is ambient household speech — queries that fail the gate are discarded without further processing.
- Kokoro TTS: synthesises replies with Kokoro-82M, also served by the local audio sidecar; playback mutes the microphone capture to prevent feedback loops.
- TTS cleaning: strips markdown, emojis, and URLs before sending text to the TTS engine so spoken output sounds natural.
- On-demand screen vision (
voice/claw-ver): triggered by a configurable phrase (e.g. “look at monitor 2”), captures the screen withgrim, converts to a format the agent accepts, and returns a spoken answer. Vision is never persistent — each capture is one-shot.
Additional voice tools:
| Script | Role |
|---|---|
voice/claw-listen | Control CLI: start/stop/status the daemon |
voice/claw-menu | wofi panel for quick actions |
voice/claw-talk + voice/clawbar-vad.py | Push-to-talk bridge + VAD (shared with clawbar) |
voice/jarvis | Single-screen status dashboard: tower health, insight of the day, one action |
Consigliere — autonomy layer
Sección titulada «Consigliere — autonomy layer»The autonomy/ subsystem provides a structured decision-support and executor framework:
- Commitment graph: cross-references incoming decisions against a knowledge graph built from git commits and project documentation, surfacing CONNECTIONS, TENSIONS, and STEELMAN arguments.
- Learning loop: captures interaction outcomes and feeds them back into the advisor to improve future recommendations.
- Executor hooks: fire shell or Python actions in response to advisor outputs (e.g. run a script, update a file, send a notification).
- Focus gate: blocks or defers low-priority interruptions based on current activity state.
Telemetry and geofencing
Sección titulada «Telemetry and geofencing»The telemetry/ subsystem ingests real-world context to make the assistant activity-aware:
- Ingests GPS coordinates and health vitals (heart rate, steps, SpO2) from Android via Health Connect.
- Maintains geofences (home, office, gym, etc.) with configurable dwell times.
- Infers current activity (coding, commuting, working out, sleeping) approximately every 20 minutes.
- Persists telemetry to Postgres for historical analysis and advisor context.
Reliability
Sección titulada «Reliability»The reliability/ subsystem ensures the system stays running and recovers automatically:
- Watchdogs: monitor key processes (daemon, Docker containers, audio sidecar) and restart them on failure.
- Dead-man heartbeat: sends periodic signals to a NAS; missed heartbeats trigger an alert.
- Context-collapse guard: detects when the LLM context has degraded (e.g. runaway memory) and resets the session.
- Resilience stress-test suite: hammers the voice pipeline with synthetic inputs to validate failure modes before they occur in production.
Installation
Sección titulada «Installation»git clone https://github.com/stevenvo780/kratos-jarvis.gitcd kratos-jarvisThen follow the subsystem-specific setup guides in docs/voice.md, docs/autonomy.md, and docs/telemetry.md.
# Install Python deps for the listen daemonpip install faster-whisper tqdm
# Start the continuous-listen enginepython3 voice/claw-listen-daemon.py
# Control the daemonvoice/claw-listen startvoice/claw-listen stopvoice/claw-listen status# Requires grim (Wayland screenshot tool)# Trigger via voice ("look at monitor 2") or directly:voice/claw-ver# Requires a Postgres database; set connection env vars firstexport TELEMETRY_DB_URL=postgres://user:pass@host/db
# Start the telemetry ingestion endpointpython3 telemetry/ingest.py# Start watchdog suitebash reliability/watchdog.sh
# Run the stress-test harnessbash reliability/stress-test.shRequirements
Sección titulada «Requirements»- Linux with PipeWire or PulseAudio
- Hyprland compositor (Wayland)
- Docker, with:
- Ollama with a small local model for the relevance gate
- Postgres for telemetry persistence
- grim (optional, for screen vision)
- wofi (optional, for the claw-menu panel)
| Layer | Technology |
|---|---|
| Agent gateway | OpenClaw (Docker container) |
| STT | faster-whisper-large-v3-turbo (local GPU) |
| TTS | Kokoro-82M (local GPU, via speaches sidecar) |
| Relevance / triage LLM | Ollama (small local model) |
| Voice daemon | Python 3 |
| Control scripts | Bash |
| Compositor | Hyprland / Wayland |
| Panel | wofi |
| Screen capture | grim + ImageMagick |
| Telemetry DB | Postgres |
| MCP tooling | MCP tool servers (e.g. agora-mcp) |
| License | MIT |