From the course: Machine Learning with Python: Foundations

Common machine learning algorithms - Python Tutorial

From the course: Machine Learning with Python: Foundations

Common machine learning algorithms

- [Instructor] During the modeling stage, it's often beneficial to experiment with multiple machine learning algorithms and to fine tune their parameters to determine which one performs best on our specific problem. There are several models to choose from. The specific characteristics of our data and the nature of the problem we intend to solve usually dictates which algorithms we can or cannot use. Let's explore a few of them. The machine learning model we built in the previous video was a linear regression model. It assumes that the relationship between a dependent variable and one or more independent variables is linear. As a result, the algorithm tries to find a straight line that does the best job of modeling this relationship using a technique known as ordinary least squares. As the name suggests, the linear regression algorithm is used to solve regression problems, such as predicting how many bikes customers are likely to rent based on weather conditions. Unlike linear regression, the logistic regression algorithm models the relationship between the independent variables and the dependent variable using an S-shaped curve known as a sigmoid curve. The curve is based on what is known as a logit function, which assumes a linear relationship between the predictors and the log odds of the outcome. In spite of its name, logistic regression is not used to solve regression problems. It's most often used for binary classification problems, such as predicting whether a customer will or will not buy a product based on demographic or social economic indicators. As the name suggests, a decision tree represents the relationship between the values of one or more independent variables and those of a dependent variable using an inverted tree-like structure made up of decision nodes and leaf nodes connected by branches. Decision trees can solve both classification and regression problems. Their intuitive nature makes them particularly useful in situations where transparency is crucial. For example, when building a model that makes recommendations on loan decisions based on whether a borrower will or will not default on their loan. Inspired by biological neurons, neural networks are particularly effective at processing patterns of inputs and outputs using a sequence of mathematical neurons arranged in layers. This makes them invaluable for tasks such as image detection, speech recognition, and language translation. Neural networks are the basis of most of the recent groundbreaking advances in AI, such as deep and large language models. The algorithms discussed so far are all supervised machine learning algorithms, each with its particular set of strengths and weaknesses. Sometimes instead of using a single algorithm to solve a problem, we can bring several weak algorithms together to build a stronger and more robust model. These types of models are known as ensembles. Ensembles are typically used for both classification and regression tasks and are known for their resiliency and effectiveness in capturing difficult patterns in the data. Random forests is a common ensemble learning technique that is made up of several decision trees known as a forest trained in parallel on subsets of the data. In an ensemble, the allocation function determines how much of the training data to assign to each member of the ensemble, while the combination function determines how the output of each member is to be combined into a single output. Gradient boosting machines is another common ensemble learning technique. It is made up of several decision trees trained in a sequential manner where each tree tries to correct the mistakes of the previous one. Unlike supervised machine learning where the data has predictive variables and outcome labels, the data we use for unsupervised learning lacks predefined labels or outcome variables. The objective of unsupervised learning algorithms is to identify inherent patterns, structures, or groupings in data without much guidance on what to look for. Once such algorithm is k-means clustering. The algorithm uses an iterative approach to assign each observation in the data to one of k clusters based on similarity. The objective is to create groupings or clusters where items within the cluster are as similar as possible, while ensuring these items are as different as possible from items in other clusters. K-means clustering is widely used for tasks such as customer segmentation, document labeling, and anomaly detection. Another common approach to unsupervised machine learning is the use of association rules. Association rules are a set of if-then statements that describe the co-occurrence of items or events within data. They're used in a variety of domains, but are most often used in market basket analysis to find relationships between items that customers tend to purchase together.

Contents