Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAD segment length cap at around 20s #1136

Closed
chiiyeh opened this issue Jul 16, 2024 · 1 comment · Fixed by #1348
Closed

VAD segment length cap at around 20s #1136

chiiyeh opened this issue Jul 16, 2024 · 1 comment · Fixed by #1348

Comments

@chiiyeh
Copy link
Contributor

chiiyeh commented Jul 16, 2024

Hi, was playing around with the VAD model and realized that the maximum speech duration is kept to around 20s regardless of the buffer size. Took a look at the code and saw that it is hardcoded in this line:

https://github.com/k2-fsa/sherpa-onnx/blob/de04b3b9bfc6d48a8ac340e00083d9fd5411b81e/sherpa-onnx/csrc/voice-activity-detector.cc#L156C7-L156C29

Would be nice if this can be a parameter that can be modified. My instinct is that the buffer sort of control the maximum duration, but that turns out to be wrong. Not sure if this is the default behaviour for the original silero vad as well.

@csukuangfj
Copy link
Collaborator

Not sure if this is the default behaviour for the original silero vad as well.

It is not the default behavior of silero vad.

We add such a constraint since many users complain that the vad gives them a very long segment.

Typically, you won't get a segment more than 20 seconds if there are longer pauses in your audio.


Would be nice if this can be a parameter that can be modified.

We accept PRs to change that. Would you like to contribute?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants