Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
Aims to address bias in a dataset by employing statistical analysis and natural language processing techniques. Initially, the team imports the dataset and creates a graph to visualize the distribution of bias labels. They identify a significant bias towards one label (82.2% to 17.8%) and set out to mitigate it. Through statistical analysis, they determine mean ratings and positive feedback counts for different department categories. Utilizing this information, they update the dataset, reassigning labels based on deviation from department-specific mean values. This statistical approach significantly reduces bias, achieving a more balanced distribution (62% to 38%). Further preprocessing involves text normalization, stemming, and lemmatization to reduce feature space. TF-IDF vectorization is employed to calculate term frequency-inverse document frequency weights, enriching the dataset representation. In conclusion, the code successfully mitigates bias through a comprehensive statistical and NLP-based approach, enhancing the dataset's utility for subsequent analysis and modeling tasks.
  • Loading branch information
oshinrathor authored May 12, 2024
1 parent 25c71c6 commit 8e1130b
Show file tree
Hide file tree
Showing 2 changed files with 30,447 additions and 0 deletions.
Loading

0 comments on commit 8e1130b

Please sign in to comment.