Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for 'Bad List' of Noisy German Samples #193

Open
abnera1 opened this issue May 25, 2024 · 0 comments
Open

Request for 'Bad List' of Noisy German Samples #193

abnera1 opened this issue May 25, 2024 · 0 comments

Comments

@abnera1
Copy link

abnera1 commented May 25, 2024

Hello,
I am currently working with the German speech recognition data provided by this project and came across the following line in the README:
"In addition, we also filter out samples that are considered 'noisy', that is, samples having very high WER (word error rate) or CER (character error rate) w.r.t. a previously trained German model."
Unfortunately, I do not have access to a pre-trained German model to calculate the WER or CER for my dataset. This makes it challenging for me to filter out the noisy samples effectively.
Could you please provide a list of these 'noisy' samples or the criteria used to identify them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant