Skip to content

ayushikaushik/Data-Science-Portfolio

Repository files navigation

Data Science Portfolio

I used Fashion MNIST dataset from Kaggle. I applied deep neural network and got accuracy of 87.10%. After applying convolution and pooling layers accuracy increased to 92.8%.

I have used Avacado Prices dataset from Kaggle in this project. I applied linear regression, k-nearest neighbors, decision tree, random forest, support vector machines and xgboost algorithms using scikit-learn library to predict prices of avacado. Using TensorFlow2.0, I created a deep neural network of depth 3. For this dataset, XGBoost gave least mean absolute error and maximum R-squared of 0.89.

I have used Cardiovascular Disease dataset from Kaggle. Based on some health conditions of an individual my model will predict whether he has any cardiovascular disease or not. After data cleaning, I applied logistic regression, k-nearest neighbors, decision tree, random forest and support vector machines algorithms using scikit-learn library to predict presence of the disease. Using TensorFlow2.0, I created a deep neural network of depth 3. After using GridSearchCV, I improved accuracy in Random Forest model. I achieved best accuracy of 74%.

In this project, I have used Credit Card Fraud Detection dataset from Kaggle. This was a highly unbalanced dataset. I downsampled the majority class to make this dataset balanced. Then, I applied machine learning algorithms to separate fraud and nn-fraud cases.

I have used Campus Recruitment Dataset from Kaggle. I did detailed exploratory anaysis on each feature. Then, I created a classification and regression task from the dataset and applied ML algorithms to solve them.

For this project, I have used Medical Cost Personal Datasets available on kaggle. I have applied linear regression model using both sklearn and statsmodel libraries. Model using Sklearn gave R2 score of 79% and one using Statsmodel gave R-squared equal to 0.98.

For this project, I have used Heart Disease Dataset available on Kaggle to predict heart disease using logistic model. My model accuracy is 86.5%.

Data Visualization Projects:

  • Data Visualization with Matplotlib: In this project, I discuss Matplotlib (the basic plotting library in Python) and throw some light on various charts and customization techniques associated with it. I discuss various types of plots like line plot, scatter plot, histogram, bar chart, pie chart, box plot and area chart.Finally, I discuss various customization techniques. I discuss how to customize the graphics with styles. I discuss how to add a grid and how to handle axes and ticks. I discuss how to add labels, title and legend. I discuss how to customize the charts with colours and line styles.

  • Data Visualization with Seaborn: For this project, I have used Penguin dataset available on kaggle. I plot univariate distribution with distplot() function. Then, I discuss histogram and kernel density estimation plots. I plot bivariate distribution with jointplot() function and discuss Seaborn scatter plot.Then, I plot categorical data with Seaborn strip plot and swarm plot. I visualize the distribution of observations with Seaborn box plot and violin plot. I measure the statistical estimates with Seaborn bar plot and point plot. I visualize the linear relationships between variables with Seaborn reg plot and lm plot. I discuss Seaborn heat map, cluster map and facet grid and pairwise relationship with pairplot() function.

  • Pandas In-built Data Visualization: In this project, I discuss various in-built visualization techniques in pandas library. I plot scatter plot, histogram, box plot, kde plot, density plot and area plot using pandas.

  • Interactive visualization with Plotly and Cufflinks: Plotly is a library that allows to create interactive plots that can be used in dashboards or websites.I plot some basic plotly interactive charts in this project.

About

Here are data science projects done by me.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published