Skip to content
Jason's Blog
Go back

Setting Up Voice Message Transcription in OpenClaw (Azure OpenAI Whisper)

Setting Up Voice Message Transcription in OpenClaw

This guide configures OpenClaw to automatically transcribe incoming voice messages using an Azure OpenAI gpt-4o-transcribe deployment. After setup, users can send voice messages via Discord, Telegram, etc., and the agent receives plain text — fully transparent.

Prerequisites

Step 1: Create the Azure OpenAI Transcription Deployment

If you don’t have one yet:

  1. Go to Azure OpenAI Studio
  2. Select your Azure OpenAI resource
  3. Go to DeploymentsCreate new deployment
  4. Model: gpt-4o-transcribe
  5. Give it a deployment name (e.g. gpt-4o-transcribe)
  6. Note down:
    • Resource name: the subdomain in your endpoint URL (e.g. my-resource from https://my-resource.openai.azure.com)
    • Deployment name: what you named it (e.g. gpt-4o-transcribe)
    • API key: found in Azure Portal → your resource → Keys and Endpoint

Step 2: Test the Endpoint

Before configuring OpenClaw, verify the endpoint works:

curl -s "https://<your-resource>.openai.azure.com/openai/deployments/<your-deployment>/audio/transcriptions?api-version=2025-03-01-preview" \
  -H "api-key: <your-api-key>" \
  -F "file=@test-audio.mp3"

You should get a JSON response with the transcribed text. If you get an error, check your resource name, deployment name, and API key.

Step 3: Configure OpenClaw

Edit openclaw.json and add or update the tools.media.audio section:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "curl",
            "args": [
              "-s",
              "https://<your-resource>.openai.azure.com/openai/deployments/<your-deployment>/audio/transcriptions?api-version=2025-03-01-preview",
              "-H",
              "api-key: <your-api-key>",
              "-F",
              "file=@{{MediaPath}}"
            ]
          }
        ]
      }
    }
  }
}

Placeholders to replace

PlaceholderExampleWhere to find
<your-resource>my-aoai-eastusAzure Portal → your OpenAI resource → Overview → Endpoint URL subdomain
<your-deployment>gpt-4o-transcribeAzure OpenAI Studio → Deployments → deployment name
<your-api-key>abc123...Azure Portal → your OpenAI resource → Keys and Endpoint → Key 1 or Key 2

Important: Do NOT change {{MediaPath}}

{{MediaPath}} is an OpenClaw template variable. At runtime, OpenClaw automatically replaces it with the actual path to the received audio file. Leave it exactly as {{MediaPath}}.

Step 4: Restart OpenClaw

openclaw gateway restart

Step 5: Verify

  1. Send a voice message to your OpenClaw bot (via Discord, Telegram, etc.)
  2. The agent should respond to the spoken content as text
  3. Check status — the media summary should show:
    📎 Media: audio ok

If the agent doesn’t understand the voice message or responds with something unrelated, check:

How It Works

The transcription pipeline runs before the message reaches the agent:

User sends voice message

OpenClaw gateway receives audio file

Gateway runs the configured curl command with the audio file

Azure OpenAI returns transcribed text (JSON)

Gateway extracts text and delivers it to the agent as a normal message

Agent sees plain text, responds normally

The agent never sees the audio file — it only receives the transcribed text. This is a gateway-level feature, not a skill.

Full openclaw.json Context

The tools.media.audio config sits inside the top-level tools object. Here’s where it fits in the overall structure:

{
  "agents": { ... },
  "channels": { ... },
  "gateway": { ... },
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "curl",
            "args": [
              "-s",
              "https://<your-resource>.openai.azure.com/openai/deployments/<your-deployment>/audio/transcriptions?api-version=2025-03-01-preview",
              "-H",
              "api-key: <your-api-key>",
              "-F",
              "file=@{{MediaPath}}"
            ]
          }
        ]
      }
    },
    "exec": { ... }
  }
}

Share this post on:

Previous Post
OpenClaw 语音转文字配置指南(Azure OpenAI)
Next Post
Installing Anthropic Document Processing Skills in OpenClaw