VAD segment length cap at around 20s #1136

chiiyeh · 2024-07-16T00:49:37Z

Hi, was playing around with the VAD model and realized that the maximum speech duration is kept to around 20s regardless of the buffer size. Took a look at the code and saw that it is hardcoded in this line:

https://github.com/k2-fsa/sherpa-onnx/blob/de04b3b9bfc6d48a8ac340e00083d9fd5411b81e/sherpa-onnx/csrc/voice-activity-detector.cc#L156C7-L156C29

Would be nice if this can be a parameter that can be modified. My instinct is that the buffer sort of control the maximum duration, but that turns out to be wrong. Not sure if this is the default behaviour for the original silero vad as well.

csukuangfj · 2024-07-16T02:47:50Z

Not sure if this is the default behaviour for the original silero vad as well.

It is not the default behavior of silero vad.

We add such a constraint since many users complain that the vad gives them a very long segment.

Typically, you won't get a segment more than 20 seconds if there are longer pauses in your audio.

Would be nice if this can be a parameter that can be modified.

We accept PRs to change that. Would you like to contribute?

csukuangfj mentioned this issue Sep 14, 2024

Support specifying max speech duration for VAD. #1348

Merged

csukuangfj closed this as completed in #1348 Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VAD segment length cap at around 20s #1136

VAD segment length cap at around 20s #1136

chiiyeh commented Jul 16, 2024

csukuangfj commented Jul 16, 2024

VAD segment length cap at around 20s #1136

VAD segment length cap at around 20s #1136

Comments

chiiyeh commented Jul 16, 2024

csukuangfj commented Jul 16, 2024