Skip to content

Whisper C Inference Action Server for ROS 2

Notifications You must be signed in to change notification settings

NathanCorral/ros2_whisper

 
 

Repository files navigation

ROS 2 Whisper

ROS 2 inference for whisper.cpp.

Build

mkdir -p ros-ai/src && cd ros-ai/src && \
git clone https://github.com/ros-ai/ros2_whisper.git && cd .. && \
colcon build --symlink-install --cmake-args -DWHISPER_CUDA=On --no-warn-unused-cli

Demos

Configure whisper parameters in whisper.yaml.

Whisper On Key

Run the inference action server (this will download models to $HOME/.cache/whisper.cpp):

ros2 launch whisper_bringup bringup.launch.py

Run a client node (activated on space bar press):

ros2 run whisper_demos whisper_on_key

Stream

Bringup whisper:

ros2 launch whisper_bringup bringup.launch.py

Launch the live transcription stream:

ros2 run whisper_demos stream

Parameters

To enable/disable inference, you can set the active parameter from the command line with:

ros2 param set /whisper/inference active false # false/true
  • Audio will still be saved in the buffer but whisper will not be run.

Available Actions

Action server under topic inference of type Inference.action.

  • The feedback message regularly publishes the actively changing portion of the transcript.

  • The final result contains stale and active portions from the start of the inference.

Published Topics

Topics of type AudioTranscript.msg on /whisper/transcript_stream, which contain the entire transcript (stale and active), are published on updates to the transcript.

Internally, the topic /whisper/tokens of type WhisperTokens.msg is used to transfer the model output between nodes.

Example

This example shows live transcription of first minute of the 6'th chapter in Harry Potter and the Philosopher's Stone from Audible:

harry_potter_sample

Troubleshoot

About

Whisper C Inference Action Server for ROS 2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 51.8%
  • Python 34.8%
  • CMake 13.4%