From the course: Machine Learning with Python: Foundations

What are the steps to machine learning? - Python Tutorial

From the course: Machine Learning with Python: Foundations

What are the steps to machine learning?

- [Instructor] There are six major steps in the machine learning process. The first is data collection. During the data collection step, our objective is to identify and gather the data we need for machine learning. For unsupervised learning, this is the unlabeled data with unknown patterns that we intend to discover. For supervised learning, this is the labeled historical data that we intend to use to train and evaluate our model. For reinforcement learning, this is the data that helps our agent learn which actions yield the most reward. If we liken the machine learning process to the process of making a delicious bowl of salad, then the data collection step is like gathering all the ingredients that would go into a salad into a single basket. The second step in the machine learning process is data exploration. Data exploration is a process of describing, visualizing, and analyzing data in order to better understand it. With data exploration, we can answer questions such as, how many rows and columns are in the data? What type of values are stored in the columns of the data? Are there missing, inconsistent, or duplicate values in the data? And are there outliers in the data? Just as we did for the previous step, if we liken the machine learning process to the process of making a bowl of salad, then the data exploration step is like inspecting every ingredient to make sure that it is fresh, ripe, and/or exactly what we want. The next step in the machine learning process is data preparation. Data preparation is the process of making sure that our data is suitable for the machine learning approach that we intend to use. It involves resolving data quality issues, such as missing data, noisy data, outlier data, and class imbalance. Data preparation also involves modifying or transforming the structure of our data in order to make it easier to work with. This includes normalizing the data, reducing the number of rows and columns in the data. Going back to our salad analogy, the data preparation step is when we begin to cut the vegetables we plan to use in our salad. Depending on the type of salad we want, we may decide to cube the vegetables, slice the vegetables, or shred the vegetables. If we plan on adding chicken to the salad, this is also the stage when we either grill, bake, or saute the chicken. Successful data science relies on good data. The data doesn't have to be perfect, but it should be good. The saying garbage in, garbage out is especially important when it comes to machine learning. Because of how important good data is, it is not unusual to spend up to 80% of our time collecting, exploring, and preparing data. After the data collection, exploration, and and preparation stages comes the modeling stage. Modeling is the process of choosing and applying the right machine learning approach that works well with the data we have and solves a problem at hand. Modeling is the most well-known stage in the machine learning process. In order to apply the right type of model, we must be clear about our objective. Knowing what type of machine learning we intend to do and what machine learning approach is capable or incapable of will go a long way in helping us be successful in this stage. In the salad analogy, the modeling stage is analogous to mixing the ingredients that we previously prepared. Depending on the type of salad we want, we mix more of some ingredients and less than others. We also decide which ingredients to include and which to avoid altogether. The fifth stage in the machine learning process is evaluation. As the name suggests, our objective in this stage is to assess how well the machine learning approach we chose worked. There are several ways to do this. In supervised learning, where our goal is to predict a label or value, we evaluate a model by measuring how well it does in predicting labels for previously unseen data. In unsupervised learning, we usually take a more subjective approach. A good unsupervised learning model is one that provides us with results that make sense to us. The evaluation stage is when we taste test our salad. If the salad needs more salt or pepper, we add some seasoning. If the salad feels a bit dry, we add some dressing. Depending on how well a model performs, we may need to build it again with slightly different data or with different settings. The idea here is to make a change that has a meaningful positive impact on the performance of our model. This is usually an iterative process. When we feel confident that the model we have is good or the best we could do given the data we have, we move on to the final stage of the machine learning process, actionable insight. This means identifying a potential course of action based on the result of the machine learning model. For supervised learning and reinforcement learning, this is the stage where we decide whether or not to deploy our model to production. In unsupervised learning, this is the stage where we decide what to do with the patterns identified by our model. As for our salad, this is when we decide whether or not to serve it.

Contents