Request for 'Bad List' of Noisy German Samples #193

abnera1 · 2024-05-25T10:25:10Z

Hello,
I am currently working with the German speech recognition data provided by this project and came across the following line in the README:
"In addition, we also filter out samples that are considered 'noisy', that is, samples having very high WER (word error rate) or CER (character error rate) w.r.t. a previously trained German model."
Unfortunately, I do not have access to a pre-trained German model to calculate the WER or CER for my dataset. This makes it challenging for me to filter out the noisy samples effectively.
Could you please provide a list of these 'noisy' samples or the criteria used to identify them?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for 'Bad List' of Noisy German Samples #193

Request for 'Bad List' of Noisy German Samples #193

abnera1 commented May 25, 2024

Request for 'Bad List' of Noisy German Samples #193

Request for 'Bad List' of Noisy German Samples #193

Comments

abnera1 commented May 25, 2024