Ir al contenido

clawbar

Bash Python 3 Wayland Hyprland waybar MIT Daily driver

GitHub: stevenvo780/clawbar


clawbar is a voice + status integration for Wayland desktops running Hyprland + waybar. It wraps an OpenClaw AI agent running in Docker into a hands-free desktop assistant:

  • Push-to-talk or fully hands-free (VAD): press SUPER+SHIFT+V or left-click the waybar module; clawbar records your mic, auto-stops on silence (~1.5 s of trailing silence), transcribes with Whisper, optionally grabs a screenshot, sends the query to the agent, and speaks the reply back with Kokoro TTS.
  • Live waybar module: shows the current assistant phase with icon, colour, and animation — updates instantly on every phase change via SIGRTMIN+8.
  • One-click actions: left-click to talk, right-click to open the dashboard, middle-click to drop into the agent’s TUI terminal.
  • Intentionally thin: never modifies the agent container. Talks to the agent with docker exec and to the audio sidecar over plain HTTP. The agent stays fully isolated.

Used daily on CachyOS (Arch-based) with Hyprland.


bin/clawbar-vad.py reads raw 16 kHz mono PCM from ffmpeg -f pulse and computes RMS energy in dBFS per 30 ms frame:

  1. Waits for speech to begin — never cuts on initial silence (no false trigger on breath/ambient noise).
  2. Once VAD_MIN_SPEECH_MS (default 300 ms) of speech has accumulated, arms the silence watchdog.
  3. After VAD_SILENCE_MS (default 1500 ms) of continuous silence, exits 0claw-talk ends the turn and sends it.
  4. Exits 2 if speech never starts within VAD_START_GRACE_MS; exits 1 on EOF.

A MAX_REC_SECS hard cap and the manual toggle remain as fallbacks. Run the self-test without a mic:

Ventana de terminal
python3 bin/clawbar-vad.py --selftest # synthetic speech + silence sequence
python3 bin/clawbar-vad.py --wav some.wav # test against a 16 kHz mono PCM WAV

StateIconwaybar classCSS animation
idle🐾idlenone
listening🎙listeningpulse
transcribing📝transcribingnone
looking👁lookingnone
thinking🤔thinkingblink
speaking🗣speakingpulse
errorerrorblink

Style each phase via #custom-claw.<class> in your waybar CSS. Default styles ship in waybar/style.css using the Catppuccin Mocha palette.


  • Linux with PipeWire (pw-record + pw-play / paplay)
  • waybar + Hyprland (the keybind and bar module; claw-talk itself works on any session)
  • ffmpeg, python3, jq, curl, notify-send
  • Docker, with:
    • an OpenClaw agent container (default name claw)
    • an audio sidecar (default name claw-audio) serving an OpenAI-compatible /v1/audio/transcriptions and /v1/audio/speech — e.g. speaches with faster-whisper + Kokoro
  • Optional: spectacle or grim + ImageMagick (magick / convert) for the screen-vision feature

Ventana de terminal
git clone https://github.com/stevenvo780/clawbar.git
cd clawbar
./install.sh

The installer:

  1. Copies bin/claw-talk, bin/clawbar-vad.py, bin/clawbar-status to ~/.local/bin/ (backs up any existing version with a timestamp).
  2. Creates ~/.config/clawbar/clawbar.env from the example (only if absent — re-running is safe).
  3. Merges the custom/claw waybar module into your config + modules file, then validates that waybar still parses — reverts if not.
  4. Appends styles to your waybar style.css.
  5. Adds the Hyprland keybind (SUPER+SHIFT+V) only if no claw-talk bind exists yet.
  6. Reloads waybar (SIGUSR2) and runs hyprctl reload.

Everything is backed up as *.bak-clawbar-<timestamp>. Re-running is safe.

Ventana de terminal
./uninstall.sh

Restores the newest *.bak-clawbar-* backup for each file and removes the installed scripts. Your clawbar.env configuration is kept.


All knobs live in clawbar.env (copy from clawbar.env.example). No secrets — the agent’s auth lives in the container or your secret store.

CLAW_CTR=claw # name of the OpenClaw agent container
AUDIO_CTR=claw-audio # name of the STT/TTS audio sidecar
VOICE=ef_dora # Kokoro voice ID
LANG_STT=es # Whisper language hint
CLAWBAR_VAD=on # enable hands-free auto-stop
VAD_SILENCE_MS=1500 # trailing silence that ends the turn (ms)
VAD_THRESH_DB=-38 # speech detection threshold (dBFS)
VAD_MIN_SPEECH_MS=300 # minimum speech before cutoff arms (ms)
VAD_START_GRACE_MS=5000 # timeout if speech never starts (ms)
MAX_REC_SECS=120 # hard cap on recording length (seconds)
CLAWBAR_WAYBAR_SIGNAL=8 # must match "signal" in the waybar module JSON

claw-talk searches for clawbar.env at: $CLAWBAR_ENV → next to the script → ~/.config/clawbar/clawbar.env → legacy locations.


ActionHow
Talk (toggle / hands-free)SUPER+SHIFT+V, left-click the bar icon, or claw-talk toggle
Force-send while recordingPress the keybind again (overrides VAD, sends immediately)
Open dashboardRight-click the bar icon
Agent terminal (TUI)Middle-click the bar icon
Speak arbitrary textclaw-talk say "mensaje aquí"
Self-checkclaw-talk test
Read current phaseclaw-talk state

Say “mirá la pantalla y decime qué ves” (configurable regex, works in Spanish and English) to trigger the screen-vision path — clawbar grabs a screenshot, downscales it, and hands it to the agent.


LayerTechnology
Core scriptsBash
VAD modulePython 3 (clawbar-vad.py, RMS dBFS per 30 ms frame)
Audio capturePipeWire + ffmpeg
STTWhisper (via speaches sidecar, OpenAI-compatible API)
TTSKokoro (via speaches sidecar)
Agent bridgedocker exec
Bar integrationwaybar JSON protocol + SIGRTMIN+8
CompositorHyprland (keybind + hyprctl reload)
Vision (optional)spectacle / grim + ImageMagick
Color schemeCatppuccin Mocha
LicenseMIT