From the course: Machine Learning with Python: Foundations

How to build a machine learning model in Python - Python Tutorial

From the course: Machine Learning with Python: Foundations

How to build a machine learning model in Python

- In this exercise, we'll use a historical data set, to build a linear regression model that predicts the number of bike rentals. Based on weather conditions. We start by importing the Panda's package. Then we import the data into a data frame called bikes and preview it. Now that we have our data, let's try to understand it. First, we get a concise summary of the structure of the data. From the summary. We can tell that there are 731 rows in the data set and that all four columns are numeric. Next, we get summary statistics for the data. The statistics show the mean minimum, maximum standard deviation and present our values for the four features in the dataset. Linear regression models assume that, there exists a linear relationship between the predictors and the response. Let's see if this assumption holds true in our dataset, to ensure that our plot show up in line, we run the map plot lib inline command. Then we create a scatter plot, between the predictive variable temperature, and the response variable rentals. The chart shows that there is a positive linear relationship between temperature and rentals. This means that as the temperature increases, so does the number of bike rentals. Next, we evaluate the relationship between humidity and rentals. This chart shows that there is a negative, linear relationship between humidity and rentals. This means that as humidity increases, the number of bike rentals decreases, finally, we evaluate the relationship between wind speed and rentals. The chart also shows a negative linear relationship between wind speed and rentals. This means that the number of bike rentals decreases, as wind speed and picks up, before we build our machine learning model, we need to split the data into training and test sets, prior to doing this, we must first separate the dependent variable, from the independent variables. Let's start by creating a data frame called Y, for the dependent variable. Then we split off the independent variables, into a data frame called X. Next, We import the train test split function, from the SK learn model selection sub package, then we split the X and Y data frames into X train X test, Y train and Y test to build a linear regression model in Python, we need to import the linear regression class, from the SK learn linear model sub package. We then use the function to build our model. So we use a linear regression function, within the function we call the fit method of the function, and we pass to it X train, and Y train, the objective of linear regression is to estimate, the intercept and slope values for a regression line, that best fits the data. We can get the estimated intercept value for a model by referring to the intercept attributes of the model. The intercept value for our regression line is 3800.68. We can also get the estimated slope values, or coefficients for the regression line by referring, to the co-ed attributes of the model, the model coefficients correspond the order, in which the independent variables are listed, in the training data. This means that the coefficient for temperature is 80.35. The coefficient for humidity is negative 4665.74. And the coefficient for wind speed is negative 196.22. one way to evaluate a linear regression model, is by calculating the coefficient of determination, or R squared. The closer this metric is to one, the better the model is. Let's get the R squared for a model. We call this score method of the model and we pass to it X test, as well as Y test. The R squared value tells us that our model is able to explain 98.2% of the variability in the response values, of the test data. That is very good. Another way to evaluate a linear regression model, is to evaluate how accurate it is. This means comparing the predicted values, against the actual values. First let's get the models predicted response values, for the test data. We're going to call our predictions, why pred and we get the models predictions by calling the predict method of the model and we pass to it, X test. Next, we import the mean absolute error function, from the SK Learn that metrics sub package and calculate the mean absolute error, between the actual response values, and the predicted response values. So mean absolute error function, passed to it Y, underscore test. Then we can pass to it Y, underscore pred. The mean absolute error implies that going forward. We should expect the predictions of our model, to be off the mark by an average of plus or minus 194 bikes. That's pretty good. Considering the little amount of work we put into a model.

Contents