This is useful when implementing a voice-based chat interface with an agent. It can be used to record the user voice and send it as input to the agent with built-in speech recognition.

The voice playground you see in the platform was built using this.

Initialize

This class takes one optional options object with the following properties:

onAudioChunk
(chunk: Blob) => any

A function that’s called everytime a new audio chunk is read.

onStateChange
(state: State) => any

A function that’s called everytime the recorder state change, available states:

  • recording: The recorder is currently recording.
  • stopped: The recorder is stopped or hasn’t started recording yet.
  • paused: The recorder has started recording but it’s paused.
onText
(text: string) => any

If in-browser speech recognition is support, this function will be called each time the recognized text is updated.

Controls

You have methods to control the recorder:

start

Start recording.

pause

Pause the recorder.

resume

Resume the recorder.

Speech recognition

If in-browser speech recognition is supported, the recorder will use it, and if it’s not, the audio will be sent to the cloud to be recognized.

You can check if the recognition property is null to know if in-browser speech recognition is supported or not:

const recognitionSupport = recorder.recognition !== null;

You can use the asRunInput method to get the inputs to be passed to the agent, this input will be built based on whether in-browser recognition is supported or not, example:

const inputs = await recorder.asRunInput();

If in-browser speech recognition, the inputs will be like this:

{
    "message": "RECOGNIZED_TEXT"
}

Otherwise It will be like this:

{
    "audio": [{
        "type": "base64",
        "value": "AUDIO_AS_BASE64"
    }]
}

You can pass the results of the asRunInput method directly to the agent like this:

const inputs = await recorder.asRunInput();
const response = await agent.run({
    inputs
});

Was this page helpful?