clawbar

Bash Python 3 Wayland Hyprland waybar MIT Daily driver

What it is

clawbar is a voice + status integration for Wayland desktops running Hyprland + waybar. It wraps an OpenClaw AI agent running in Docker into a hands-free desktop assistant:

Push-to-talk or fully hands-free (VAD): press SUPER+SHIFT+V or left-click the waybar module; clawbar records your mic, auto-stops on silence (~1.5 s of trailing silence), transcribes with Whisper, optionally grabs a screenshot, sends the query to the agent, and speaks the reply back with Kokoro TTS.
Live waybar module: shows the current assistant phase with icon, colour, and animation — updates instantly on every phase change via SIGRTMIN+8.
One-click actions: left-click to talk, right-click to open the dashboard, middle-click to drop into the agent’s TUI terminal.
Intentionally thin: never modifies the agent container. Talks to the agent with docker exec and to the audio sidecar over plain HTTP. The agent stays fully isolated.

Used daily on CachyOS (Arch-based) with Hyprland.

How the VAD works

bin/clawbar-vad.py reads raw 16 kHz mono PCM from ffmpeg -f pulse and computes RMS energy in dBFS per 30 ms frame:

Waits for speech to begin — never cuts on initial silence (no false trigger on breath/ambient noise).
Once VAD_MIN_SPEECH_MS (default 300 ms) of speech has accumulated, arms the silence watchdog.
After VAD_SILENCE_MS (default 1500 ms) of continuous silence, exits 0 → claw-talk ends the turn and sends it.
Exits 2 if speech never starts within VAD_START_GRACE_MS; exits 1 on EOF.

A MAX_REC_SECS hard cap and the manual toggle remain as fallbacks. Run the self-test without a mic:

python3 bin/clawbar-vad.py --selftest      # synthetic speech + silence sequence
python3 bin/clawbar-vad.py --wav some.wav  # test against a 16 kHz mono PCM WAV

Phase states

State	Icon	waybar class	CSS animation
idle	🐾	`idle`	none
listening	🎙	`listening`	pulse
transcribing	📝	`transcribing`	none
looking	👁	`looking`	none
thinking	🤔	`thinking`	blink
speaking	🗣	`speaking`	pulse
error	⚠	`error`	blink

Style each phase via #custom-claw.<class> in your waybar CSS. Default styles ship in waybar/style.css using the Catppuccin Mocha palette.

Requirements

Linux with PipeWire (pw-record + pw-play / paplay)
waybar + Hyprland (the keybind and bar module; claw-talk itself works on any session)
ffmpeg, python3, jq, curl, notify-send
Docker, with:
- an OpenClaw agent container (default name claw)
- an audio sidecar (default name claw-audio) serving an OpenAI-compatible /v1/audio/transcriptions and /v1/audio/speech — e.g. speaches with faster-whisper + Kokoro
Optional: spectacle or grim + ImageMagick (magick / convert) for the screen-vision feature

Installation

git clone https://github.com/stevenvo780/clawbar.git
cd clawbar
./install.sh

The installer:

Copies bin/claw-talk, bin/clawbar-vad.py, bin/clawbar-status to ~/.local/bin/ (backs up any existing version with a timestamp).
Creates ~/.config/clawbar/clawbar.env from the example (only if absent — re-running is safe).
Merges the custom/claw waybar module into your config + modules file, then validates that waybar still parses — reverts if not.
Appends styles to your waybar style.css.
Adds the Hyprland keybind (SUPER+SHIFT+V) only if no claw-talk bind exists yet.
Reloads waybar (SIGUSR2) and runs hyprctl reload.

Everything is backed up as *.bak-clawbar-<timestamp>. Re-running is safe.

CLAWBAR_BIN_DIR=~/bin \
CLAWBAR_WAYBAR_BAR_ARRAY=modules-center \
CLAWBAR_CLICK_RIGHT='~/.local/bin/chat-claw.sh' \
./install.sh

All path and target overrides are passed via environment variables.

Uninstall

./uninstall.sh

Restores the newest *.bak-clawbar-* backup for each file and removes the installed scripts. Your clawbar.env configuration is kept.

Configuration

All knobs live in clawbar.env (copy from clawbar.env.example). No secrets — the agent’s auth lives in the container or your secret store.

CLAW_CTR=claw            # name of the OpenClaw agent container
AUDIO_CTR=claw-audio     # name of the STT/TTS audio sidecar

VOICE=ef_dora            # Kokoro voice ID
LANG_STT=es              # Whisper language hint

CLAWBAR_VAD=on           # enable hands-free auto-stop
VAD_SILENCE_MS=1500      # trailing silence that ends the turn (ms)
VAD_THRESH_DB=-38        # speech detection threshold (dBFS)
VAD_MIN_SPEECH_MS=300    # minimum speech before cutoff arms (ms)
VAD_START_GRACE_MS=5000  # timeout if speech never starts (ms)

MAX_REC_SECS=120         # hard cap on recording length (seconds)

CLAWBAR_WAYBAR_SIGNAL=8  # must match "signal" in the waybar module JSON

claw-talk searches for clawbar.env at: $CLAWBAR_ENV → next to the script → ~/.config/clawbar/clawbar.env → legacy locations.

Usage

Action	How
Talk (toggle / hands-free)	`SUPER+SHIFT+V`, left-click the bar icon, or `claw-talk toggle`
Force-send while recording	Press the keybind again (overrides VAD, sends immediately)
Open dashboard	Right-click the bar icon
Agent terminal (TUI)	Middle-click the bar icon
Speak arbitrary text	`claw-talk say "mensaje aquí"`
Self-check	`claw-talk test`
Read current phase	`claw-talk state`

Say “mirá la pantalla y decime qué ves” (configurable regex, works in Spanish and English) to trigger the screen-vision path — clawbar grabs a screenshot, downscales it, and hands it to the agent.

Stack

Layer	Technology
Core scripts	Bash
VAD module	Python 3 (`clawbar-vad.py`, RMS dBFS per 30 ms frame)
Audio capture	PipeWire + ffmpeg
STT	Whisper (via speaches sidecar, OpenAI-compatible API)
TTS	Kokoro (via speaches sidecar)
Agent bridge	`docker exec`
Bar integration	waybar JSON protocol + `SIGRTMIN+8`
Compositor	Hyprland (keybind + `hyprctl reload`)
Vision (optional)	`spectacle` / `grim` + ImageMagick
Color scheme	Catppuccin Mocha
License	MIT