Add files via upload · oshinrathor/datSci@8e1130b

Commit

Add files via upload

Aims to address bias in a dataset by employing statistical analysis and natural language processing techniques. Initially, the team imports the dataset and creates a graph to visualize the distribution of bias labels. They identify a significant bias towards one label (82.2% to 17.8%) and set out to mitigate it. Through statistical analysis, they determine mean ratings and positive feedback counts for different department categories. Utilizing this information, they update the dataset, reassigning labels based on deviation from department-specific mean values. This statistical approach significantly reduces bias, achieving a more balanced distribution (62% to 38%). Further preprocessing involves text normalization, stemming, and lemmatization to reduce feature space. TF-IDF vectorization is employed to calculate term frequency-inverse document frequency weights, enriching the dataset representation. In conclusion, the code successfully mitigates bias through a comprehensive statistical and NLP-based approach, enhancing the dataset's utility for subsequent analysis and modeling tasks.

Loading branch information

oshinrathor authored May 12, 2024

1 parent 25c71c6 commit 8e1130b

0 comments on commit `8e1130b`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `8e1130b`

Commit

There are no files selected for viewing

0 comments on commit 8e1130b

0 comments on commit `8e1130b`