Data Science Portfolio

Neural Network Project:

I used Fashion MNIST dataset from Kaggle. I applied deep neural network and got accuracy of 87.10%. After applying convolution and pooling layers accuracy increased to 92.8%.

Regression Project:

I have used Avacado Prices dataset from Kaggle in this project. I applied linear regression, k-nearest neighbors, decision tree, random forest, support vector machines and xgboost algorithms using scikit-learn library to predict prices of avacado. Using TensorFlow2.0, I created a deep neural network of depth 3. For this dataset, XGBoost gave least mean absolute error and maximum R-squared of 0.89.

Classification Project 1:

I have used Cardiovascular Disease dataset from Kaggle. Based on some health conditions of an individual my model will predict whether he has any cardiovascular disease or not. After data cleaning, I applied logistic regression, k-nearest neighbors, decision tree, random forest and support vector machines algorithms using scikit-learn library to predict presence of the disease. Using TensorFlow2.0, I created a deep neural network of depth 3. After using GridSearchCV, I improved accuracy in Random Forest model. I achieved best accuracy of 74%.

Classification Project 2:

In this project, I have used Credit Card Fraud Detection dataset from Kaggle. This was a highly unbalanced dataset. I downsampled the majority class to make this dataset balanced. Then, I applied machine learning algorithms to separate fraud and nn-fraud cases.

Data Analysis Project:

I have used Campus Recruitment Dataset from Kaggle. I did detailed exploratory anaysis on each feature. Then, I created a classification and regression task from the dataset and applied ML algorithms to solve them.

Linear Regression Project:

For this project, I have used Medical Cost Personal Datasets available on kaggle. I have applied linear regression model using both sklearn and statsmodel libraries. Model using Sklearn gave R2 score of 79% and one using Statsmodel gave R-squared equal to 0.98.

Logistic Regression Project:

For this project, I have used Heart Disease Dataset available on Kaggle to predict heart disease using logistic model. My model accuracy is 86.5%.

Data Visualization Projects:

Data Visualization with Matplotlib: In this project, I discuss Matplotlib (the basic plotting library in Python) and throw some light on various charts and customization techniques associated with it. I discuss various types of plots like line plot, scatter plot, histogram, bar chart, pie chart, box plot and area chart.Finally, I discuss various customization techniques. I discuss how to customize the graphics with styles. I discuss how to add a grid and how to handle axes and ticks. I discuss how to add labels, title and legend. I discuss how to customize the charts with colours and line styles.
Data Visualization with Seaborn: For this project, I have used Penguin dataset available on kaggle. I plot univariate distribution with distplot() function. Then, I discuss histogram and kernel density estimation plots. I plot bivariate distribution with jointplot() function and discuss Seaborn scatter plot.Then, I plot categorical data with Seaborn strip plot and swarm plot. I visualize the distribution of observations with Seaborn box plot and violin plot. I measure the statistical estimates with Seaborn bar plot and point plot. I visualize the linear relationships between variables with Seaborn reg plot and lm plot. I discuss Seaborn heat map, cluster map and facet grid and pairwise relationship with pairplot() function.
Pandas In-built Data Visualization: In this project, I discuss various in-built visualization techniques in pandas library. I plot scatter plot, histogram, box plot, kde plot, density plot and area plot using pandas.
Interactive visualization with Plotly and Cufflinks: Plotly is a library that allows to create interactive plots that can be used in dashboards or websites.I plot some basic plotly interactive charts in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Classification Project		Classification Project
Data Analysis Project		Data Analysis Project
Linear Regression Project		Linear Regression Project
Logistic Regression Project		Logistic Regression Project
Regression Project		Regression Project
data viz with matplotlib		data viz with matplotlib
data viz with seaborn		data viz with seaborn
knn and kmeans		knn and kmeans
pandas in-built data viz		pandas in-built data viz
plotly and cufflinks		plotly and cufflinks
README.md		README.md
dnn-and-cnn-with-tensorflow2-0.ipynb		dnn-and-cnn-with-tensorflow2-0.ipynb
over-sampling-and-6-classification-algorithms.ipynb		over-sampling-and-6-classification-algorithms.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Portfolio

Neural Network Project:

Regression Project:

Classification Project 1:

Classification Project 2:

Data Analysis Project:

Linear Regression Project:

Logistic Regression Project:

Data Visualization Projects:

About

Releases

Packages

Languages

ayushikaushik/Data-Science-Portfolio

Folders and files

Latest commit

History

Repository files navigation

Data Science Portfolio

Neural Network Project:

Regression Project:

Classification Project 1:

Classification Project 2:

Data Analysis Project:

Linear Regression Project:

Logistic Regression Project:

Data Visualization Projects:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages