I used Fashion MNIST dataset from Kaggle. I applied deep neural network and got accuracy of 87.10%. After applying convolution and pooling layers accuracy increased to 92.8%.
I have used Avacado Prices dataset from Kaggle in this project. I applied linear regression, k-nearest neighbors, decision tree, random forest, support vector machines and xgboost algorithms using scikit-learn library to predict prices of avacado. Using TensorFlow2.0, I created a deep neural network of depth 3. For this dataset, XGBoost gave least mean absolute error and maximum R-squared of 0.89.
I have used Cardiovascular Disease dataset from Kaggle. Based on some health conditions of an individual my model will predict whether he has any cardiovascular disease or not. After data cleaning, I applied logistic regression, k-nearest neighbors, decision tree, random forest and support vector machines algorithms using scikit-learn library to predict presence of the disease. Using TensorFlow2.0, I created a deep neural network of depth 3. After using GridSearchCV, I improved accuracy in Random Forest model. I achieved best accuracy of 74%.
In this project, I have used Credit Card Fraud Detection dataset from Kaggle. This was a highly unbalanced dataset. I downsampled the majority class to make this dataset balanced. Then, I applied machine learning algorithms to separate fraud and nn-fraud cases.
I have used Campus Recruitment Dataset from Kaggle. I did detailed exploratory anaysis on each feature. Then, I created a classification and regression task from the dataset and applied ML algorithms to solve them.
For this project, I have used Medical Cost Personal Datasets available on kaggle. I have applied linear regression model using both sklearn and statsmodel libraries. Model using Sklearn gave R2 score of 79% and one using Statsmodel gave R-squared equal to 0.98.
For this project, I have used Heart Disease Dataset available on Kaggle to predict heart disease using logistic model. My model accuracy is 86.5%.
-
Data Visualization with Matplotlib: In this project, I discuss Matplotlib (the basic plotting library in Python) and throw some light on various charts and customization techniques associated with it. I discuss various types of plots like line plot, scatter plot, histogram, bar chart, pie chart, box plot and area chart.Finally, I discuss various customization techniques. I discuss how to customize the graphics with styles. I discuss how to add a grid and how to handle axes and ticks. I discuss how to add labels, title and legend. I discuss how to customize the charts with colours and line styles.
-
Data Visualization with Seaborn: For this project, I have used Penguin dataset available on kaggle. I plot univariate distribution with distplot() function. Then, I discuss histogram and kernel density estimation plots. I plot bivariate distribution with jointplot() function and discuss Seaborn scatter plot.Then, I plot categorical data with Seaborn strip plot and swarm plot. I visualize the distribution of observations with Seaborn box plot and violin plot. I measure the statistical estimates with Seaborn bar plot and point plot. I visualize the linear relationships between variables with Seaborn reg plot and lm plot. I discuss Seaborn heat map, cluster map and facet grid and pairwise relationship with pairplot() function.
-
Pandas In-built Data Visualization: In this project, I discuss various in-built visualization techniques in pandas library. I plot scatter plot, histogram, box plot, kde plot, density plot and area plot using pandas.
-
Interactive visualization with Plotly and Cufflinks: Plotly is a library that allows to create interactive plots that can be used in dashboards or websites.I plot some basic plotly interactive charts in this project.