Implementation of Anomaly Detection for High-Dimensional Data Using Large Deviations Principle: Sreelekha Guggilam and Varun Chandola and Abani Patra, Anomaly Detection for High-Dimensional Data Using Large Deviations Principle(In preparation)(2021).
Top 5 anomalous counties identified by the proposed LAD algorithm based on the daily multivariate time-series, consisting ofcumulative COVID-19 per-capita infections and deaths. At any time-instance, the algorithm analyzes the bi-variate time series for all thecounties to identify anomalies. The time-series for the non-anomalous counties are plotted (light-gray) in the background for reference. For the counties in North Dakota (Burleigh and Grand Forks), the number of confirmed cases (top), and the sharp rise in November 2020, is theprimary cause for anomaly. On the other hand, Wayne County in Michigan was identified as anomalous primarily because of its abnormallyhigh death rate, especially when compared to the relatively moderate confirmed infection rate.
- Run import_libraries.ipynb, import_functions.ipynb, import_global_params.ipynb (optional) to import required libraries and functions
- Run LDP_paper_results_8-Evaluation Small, large.ipynb to run the LAD model on datasets
- Run LDP_paper_results_8-COVID TS plots only 50k population lower limit.ipynb to generate plots for COVID-19 data for US Counties
-
COVID-19 US County Level Data : Ensheng Dong, Hongru Du, and Lauren Gardner. 2020. An interactive web-baseddashboard to track COVID-19 in real time.The Lancet infectious diseases20, 5(2020), 533–534.
-
COVID-19 Country Level Data : Hasell, J., Mathieu, E., Beltekian, D. et al. A cross-country database of COVID-19 testing. Sci Data 7, 345 (2020). https://doi.org/10.1038/s41597-020-00688-8
-
ODDS Data : Shebuti Rayana. 2016. ODDS Library.