Skip to content

Various python tests in to identify and describe multi-modal data distributions.

Notifications You must be signed in to change notification settings

ciortanmadalina/modality_tests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Modality tests and Kernel density estimations

When processing a large number of datasets which can potentially have different data distributions, we are confronted with the following considerations:

  • Is the data distribution unimodal and if it is the case, which model best approximates it( uniform distribution, T-distribution, chi-square distribution, cauchy distribution, etc)?
  • If the data distribution is multimodal, can we automatically identify the number of modes and provide more granular descriptive statistics?
  • How can we estimate the probability density function of a new dataset?

This notebook tackles the following subjects:

  • Histograms vs probability density function approximation
  • Kernel density estimations
  • Choice of optimal bandwidth: Silverman/ Scott/ Grid Search Cross Validation
  • Statistical tests for unimodal distributions
  • DIP test for unimodality
  • Identification of the number of modes of a data distribution based on the kernel density estimation

Find the complete Medium post here

About

Various python tests in to identify and describe multi-modal data distributions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published