ROS 2 inference for whisper.cpp.
- Install
pyaudio
, see install instructions. - Build this repository, do
mkdir -p ros-ai/src && cd ros-ai/src && \
git clone https://github.com/ros-ai/ros2_whisper.git && cd .. && \
colcon build --symlink-install --cmake-args -DWHISPER_CUDA=On --no-warn-unused-cli
Configure whisper
parameters in whisper.yaml.
Run the inference action server (this will download models to $HOME/.cache/whisper.cpp
):
ros2 launch whisper_bringup bringup.launch.py
Run a client node (activated on space bar press):
ros2 run whisper_demos whisper_on_key
Bringup whisper:
ros2 launch whisper_bringup bringup.launch.py
Launch the live transcription stream:
ros2 run whisper_demos stream
To enable/disable inference, you can set the active parameter from the command line with:
ros2 param set /whisper/inference active false # false/true
- Audio will still be saved in the buffer but whisper will not be run.
Action server under topic inference
of type Inference.action.
-
The feedback message regularly publishes the actively changing portion of the transcript.
-
The final result contains stale and active portions from the start of the inference.
Topics of type AudioTranscript.msg on /whisper/transcript_stream
, which contain the entire transcript (stale and active), are published on updates to the transcript.
Internally, the topic /whisper/tokens
of type WhisperTokens.msg is used to transfer the model output between nodes.
This example shows live transcription of first minute of the 6'th chapter in Harry Potter and the Philosopher's Stone from Audible:
- Encoder inference time: ggerganov/whisper.cpp#10 (comment)