Module 11: Voice Interaction & Live Canvas

Learning Objectives

By the end of this module, you will be able to:

Understand OpenClaw's voice interaction architecture and Vapi integration
Configure voice input and output capabilities
Use Live Canvas for visual feedback
Install and use the Companion App Beta (macOS menubar application)
Build a complete voice-controlled Agent
Combine voice, visual, and text for multimodal interaction

Core Concepts

Voice Interaction Architecture

OpenClaw's voice capabilities are integrated through the Vapi (Voice API) platform, delivering low-latency voice conversations:

User's Voice
    │
    ▼
┌──────────┐    WebSocket     ┌──────────┐
│ Micro-   │ ──────────────→  │   Vapi   │
│ phone    │                  │ Platform │
│ (STT)    │                  └────┬─────┘
└──────────┘                       │ Text conversion
                                   ▼
                            ┌──────────────┐
                            │   OpenClaw    │
                            │   Agent      │
                            │   (LLM)     │
                            └──────┬───────┘
                                   │ Response text
                                   ▼
┌──────────┐    Audio Stream  ┌──────────┐
│ Speaker  │ ←────────────── │   Vapi   │
│ (TTS)    │                  │  (Speech │
└──────────┘                  │ synthesis)│
                              └──────────┘

Key Components

Component	Function	Technology
Vapi	Voice conversation platform	WebSocket, WebRTC
STT (Speech-to-Text)	Converts speech to text	Deepgram / Whisper
TTS (Text-to-Speech)	Converts text to speech	ElevenLabs / Azure
Live Canvas	Real-time visual feedback	HTML5 Canvas / WebSocket
Companion App	macOS menubar application	Electron / Swift

Live Canvas Concept

Live Canvas is OpenClaw's real-time visual feedback system, allowing the Agent to "draw" responses rather than relying on text alone. Typical use cases:

Real-time chart and data visualization rendering
Search result preview cards
Live code execution output
Interactive UI components (buttons, forms)
Map markers and navigation routes

Agent Response
    │
    ├─→ Text channel (Discord/Matrix): Text response
    │
    ├─→ Voice channel (Vapi): Voice response
    │
    └─→ Live Canvas: Visual feedback
         ├── Charts
         ├── Code
         ├── Images
         └── Interactive widgets

Companion App Beta

The Companion App is an OpenClaw macOS menubar application (Beta stage) that provides:

Quick Agent access from the system tray
Global hotkey to invoke the conversation window
Desktop-side Live Canvas rendering
Voice interaction integration
Notification center integration

Implementation Guide

Step 1: Vapi Account Setup

# 1. Go to https://vapi.ai and create an account
# 2. Obtain your API Key
# 3. Create an Assistant (corresponding to your OpenClaw Agent)

Create the Assistant configuration in the Vapi Dashboard:

{
  "name": "OpenClaw Voice Assistant",
  "transcriber": {
    "provider": "deepgram",
    "model": "nova-2",
    "language": "en"
  },
  "voice": {
    "provider": "elevenlabs",
    "voiceId": "your-voice-id",
    "stability": 0.5,
    "similarityBoost": 0.75
  },
  "model": {
    "provider": "custom-llm",
    "url": "https://your-server/api/vapi/webhook",
    "model": "openclaw-agent"
  },
  "silenceTimeoutSeconds": 30,
  "maxDurationSeconds": 600,
  "firstMessage": "Hello! I'm your OpenClaw voice assistant. How can I help you?"
}

Step 2: OpenClaw Voice Configuration

Add Vapi integration to settings.json:

{
  "voice": {
    "enabled": true,
    "provider": "vapi",
    "vapi": {
      "api_key": "${VAPI_API_KEY}",
      "assistant_id": "${VAPI_ASSISTANT_ID}",
      "webhook_path": "/api/vapi/webhook",
      "language": "en",
      "voice_settings": {
        "speed": 1.0,
        "pitch": 1.0
      }
    },
    "wake_word": {
      "enabled": true,
      "phrase": "Hey OpenClaw",
      "sensitivity": 0.7
    },
    "auto_listen": {
      "enabled": false,
      "timeout_seconds": 30
    }
  }
}

Step 3: Build the Vapi Webhook Handler

OpenClaw needs a webhook endpoint to receive Vapi's speech-to-text results:

// skills/vapi-handler/index.js
module.exports = {
  name: "vapi-handler",
  description: "Handle Vapi voice webhooks",

  // Vapi sends POST requests to this endpoint
  async handleWebhook(request, context) {
    const { type, message } = request.body;

    switch (type) {
      case 'assistant-request':
        // Vapi is requesting the Agent's response
        return {
          assistant: {
            firstMessage: "Hello! How can I help?",
            model: {
              provider: "custom-llm",
              messages: context.agent.getConversationHistory()
            }
          }
        };

      case 'function-call':
        // A Skill call was triggered via voice
        const { functionName, parameters } = message;
        const result = await context.agent.callSkill(
          functionName,
          parameters
        );
        return { result: JSON.stringify(result) };

      case 'end-of-call-report':
        // Call ended report
        const { duration, transcript } = message;
        await context.agent.saveToMemory({
          type: 'voice_conversation',
          duration,
          transcript,
          timestamp: new Date().toISOString()
        });
        return { received: true };

      default:
        return { received: true };
    }
  }
};

Step 4: Configure Live Canvas

Enable Live Canvas visual feedback:

{
  "canvas": {
    "enabled": true,
    "port": 3001,
    "host": "127.0.0.1",
    "features": {
      "charts": true,
      "code_preview": true,
      "image_display": true,
      "interactive_widgets": true,
      "markdown_render": true
    },
    "theme": {
      "mode": "auto",
      "primary_color": "#6366f1"
    }
  }
}

Using Live Canvas from within a Skill:

// Push visual content to the Canvas from within a Skill
module.exports = {
  name: "weather-visual",
  description: "Display weather visualization",

  async execute(context) {
    const { canvas, params } = context;
    const weatherData = await fetchWeather(params.city);

    // Push a chart to Canvas
    await canvas.render({
      type: 'chart',
      chart: {
        type: 'line',
        data: {
          labels: weatherData.hourly.map(h => h.time),
          datasets: [{
            label: 'Temperature (°C)',
            data: weatherData.hourly.map(h => h.temp),
            borderColor: '#ef4444',
            tension: 0.4
          }, {
            label: 'Rain Probability (%)',
            data: weatherData.hourly.map(h => h.rain_prob),
            borderColor: '#3b82f6',
            tension: 0.4
          }]
        },
        options: {
          responsive: true,
          plugins: {
            title: {
              display: true,
              text: `${params.city} — Today's Weather Forecast`
            }
          }
        }
      }
    });

    // Also push a summary card
    await canvas.render({
      type: 'card',
      card: {
        title: `${params.city} Weather`,
        subtitle: weatherData.summary,
        icon: weatherData.icon,
        fields: [
          { label: 'Current Temp', value: `${weatherData.current.temp}°C` },
          { label: 'Feels Like', value: `${weatherData.current.feels_like}°C` },
          { label: 'Humidity', value: `${weatherData.current.humidity}%` },
          { label: 'Wind Speed', value: `${weatherData.current.wind_speed} m/s` }
        ]
      }
    });

    return {
      text: `Current temperature in ${params.city} is ${weatherData.current.temp}°C. ${weatherData.summary}`,
      canvas_rendered: true
    };
  }
};

Step 5: Install the Companion App Beta

# Download the Companion App (macOS)
curl -L -o OpenClaw-Companion.dmg \
  https://github.com/openclaw/companion-app/releases/latest/download/OpenClaw-Companion-macOS.dmg

# Install
hdiutil attach OpenClaw-Companion.dmg
cp -R "/Volumes/OpenClaw Companion/OpenClaw Companion.app" /Applications/
hdiutil detach "/Volumes/OpenClaw Companion"

# Launch
open "/Applications/OpenClaw Companion.app"

Companion App configuration:

{
  "openclaw_url": "http://127.0.0.1:18789",
  "api_key": "${OPENCLAW_API_KEY}",
  "hotkey": "Cmd+Shift+Space",
  "voice": {
    "enabled": true,
    "push_to_talk_key": "Cmd+Shift+V"
  },
  "canvas": {
    "enabled": true,
    "position": "right",
    "width": 400
  },
  "notifications": {
    "enabled": true,
    "heartbeat_messages": true,
    "alert_messages": true,
    "sound": true
  },
  "appearance": {
    "theme": "auto",
    "menubar_icon": "default",
    "show_in_dock": false
  }
}

Companion App Beta Limitations

The Companion App is currently in Beta with the following known limitations:

Only supports macOS 12.0 (Monterey) and above
Voice features require microphone permission
Complex charts on Live Canvas may have rendering delays
Multi-Agent switching is not yet supported (planned)
Occasional memory leak issues (recommended to restart daily)

Step 6: Build a Voice-Controlled Agent

Integrate voice, Canvas, and Skills into a complete example.

Set voice interaction rules in soul.md:

# Voice Interaction Assistant

You are an AI assistant that supports voice conversations.

## Voice Response Rules
- Keep responses concise; voice should not exceed 30 seconds
- Use conversational English
- Present numbers and lists on Live Canvas; voice only provides the summary
- If the user says "show me," use Canvas to present detailed content
- If the user says "repeat that," replay the previous response via voice

## Voice Commands
- "Check the weather" → Display weather + Canvas chart
- "Read the news" → Voice-read today's headlines
- "Make a note" → Save subsequent content to memory
- "Start a timer" → Start a timer
- "Mute" → Pause voice output; use only text and Canvas

Step 7: Multimodal Interaction Example

// skills/multimodal-assistant/index.js
module.exports = {
  name: "multimodal-assistant",
  description: "Multimodal interaction assistant",

  async execute(context) {
    const { agent, canvas, voice, channel, params } = context;
    const query = params.query;

    // Determine the best response modality based on query type
    const queryType = await agent.classify(query, [
      'data_visualization',   // Use Canvas
      'short_answer',         // Use voice
      'long_content',         // Use text
      'interactive'           // Use Canvas interactive widgets
    ]);

    switch (queryType) {
      case 'data_visualization':
        // Voice provides the summary, Canvas displays the chart
        const data = await agent.callSkill('data-fetcher', params);
        await voice.speak(`Data retrieved. Generating your chart now.`);
        await canvas.render({
          type: 'chart',
          chart: data.visualization
        });
        break;

      case 'short_answer':
        // Voice-only response
        const answer = await agent.think(query);
        await voice.speak(answer);
        break;

      case 'long_content':
        // Voice provides summary, Canvas displays full content
        const content = await agent.think(query);
        const summary = await agent.summarize(content, { maxWords: 50 });
        await voice.speak(summary);
        await canvas.render({
          type: 'markdown',
          content: content
        });
        break;

      case 'interactive':
        // Canvas displays interactive widgets
        await voice.speak(`Got it. I've prepared the interactive form for you.`);
        await canvas.render({
          type: 'form',
          fields: params.form_fields,
          onSubmit: 'handle_form_submission'
        });
        break;
    }

    return { mode: queryType };
  }
};

Common Errors

Issue	Cause	Solution
Voice latency exceeds 3 seconds	Vapi STT/TTS processing time	Choose a faster STT model (e.g., Deepgram nova-2)
High speech recognition error rate	Model doesn't support the language or wrong language setting	Confirm `language` is set correctly (e.g., `en`)
Canvas content not displaying	WebSocket connection failed	Verify the Canvas port is correct and not blocked by a firewall
Companion App can't connect	Wrong API URL or Key	Check `openclaw_url` and API Key
Wake word false triggers	Sensitivity set too high	Lower `sensitivity` to 0.5

Reducing Voice Latency

Voice interaction experience depends on end-to-end latency. Optimization tips:

Use Deepgram for STT (lowest latency, ~300ms)
Enable streaming mode for LLM responses
Use ElevenLabs Turbo for TTS (~500ms latency)
Target overall latency: < 2 seconds

Troubleshooting

# Check Vapi connection status
curl -s http://127.0.0.1:18789/api/voice/status

# Check Canvas WebSocket
curl -s http://127.0.0.1:3001/health

# Test the voice webhook
curl -X POST http://127.0.0.1:18789/api/vapi/webhook \
  -H "Content-Type: application/json" \
  -d '{"type": "assistant-request"}'

# Companion App logs
tail -f ~/Library/Logs/OpenClaw\ Companion/main.log

Exercises

Exercise 1: Voice Weather Assistant

Set up a voice-controlled weather assistant. When you say "check the weather," the Agent provides a voice summary and displays a temperature trend chart on Canvas.

Exercise 2: Voice Memo

Build a voice memo Agent:

Say "make a note: [content]" to save to memory
Say "what did I note today" to list all memos
Use Canvas to display the memo list, voice to report the count

Exercise 3: Meeting Assistant

Build a meeting voice assistant:

Real-time speech-to-text (STT)
Canvas displays a live transcript
Auto-generate a summary when the meeting ends
Identify and tag action items

Quiz

What role does Vapi play in OpenClaw's voice architecture?
- A) Directly runs the LLM
- B) Provides STT/TTS services and bridges voice with the Agent
- C) Stores voice recordings
- D) Manages Agent scheduling
View Answer
B) Vapi handles converting speech to text (STT), converting the Agent's text responses to speech (TTS), and managing real-time voice streaming via WebSocket.
What type of content is Live Canvas best suited for?
- A) Short text answers
- B) Charts, data visualizations, and interactive widgets
- C) Plain text conversations
- D) System settings
View Answer
B) Live Canvas excels at real-time rendering of visual content, ideal for charts, preview cards, interactive forms, and other information that pure text or voice cannot effectively convey.
Which platform does the Companion App Beta currently support?
- A) Windows and macOS
- B) macOS only
- C) Linux only
- D) All platforms
View Answer
B) The Companion App Beta currently only supports macOS 12.0 (Monterey) and above. Windows and Linux versions are still in development.
How do you reduce end-to-end voice interaction latency?
- A) Use a larger LLM model
- B) Use Deepgram (STT) + streaming responses (LLM) + ElevenLabs Turbo (TTS)
- C) Increase Vapi's timeout setting
- D) Disable Canvas
View Answer
B) Choosing low-latency STT (Deepgram nova-2, ~300ms), enabling LLM streaming, and using fast TTS (ElevenLabs Turbo, ~500ms) can keep overall latency under 2 seconds.

Next Steps

Module 5: Memory System -- Persist voice conversation memories
Module 6: Cron Jobs / Heartbeat -- Set up proactive voice reminders for your Agent
Module 12: Enterprise Applications -- Deploy voice assistants in enterprise environments

Learning Objectives​

Core Concepts​

Voice Interaction Architecture​

Key Components​

Live Canvas Concept​

Companion App Beta​

Implementation Guide​

Step 1: Vapi Account Setup​

Step 2: OpenClaw Voice Configuration​

Step 3: Build the Vapi Webhook Handler​

Step 4: Configure Live Canvas​

Step 5: Install the Companion App Beta​

Step 6: Build a Voice-Controlled Agent​

Step 7: Multimodal Interaction Example​

Common Errors​

Troubleshooting​

Exercises​

Exercise 1: Voice Weather Assistant​

Exercise 2: Voice Memo​

Exercise 3: Meeting Assistant​

Quiz​

Next Steps​