From the course: Machine Learning with Python: Foundations

How to import data in Python - Python Tutorial

From the course: Machine Learning with Python: Foundations

How to import data in Python

- [Instructor] One of the reasons why Python is such a popular programming language for machine learning is because it supports some very powerful and easy to use packages, which are purpose built for data analysis. One of these packages is a pandas package. The pandas package provides several easy to use functions for creating, structuring, and importing data. Before we can use any of these functions, we first have to import the pandas package using the import command. Here, the import command imports the pandas package, and we use an alias for the package. We call it pd. This allows us to refer to the functions of the package by simply referring to pd dot a function name. One of the ways the pandas represents data is as a series. A panda series is heterogeneous one dimensional array-like data structure with labeled rows. We can create a panda series from a previously created list. Given the members list, we can create a series object as follows. We're going to create a series object called bricks1, and we create the series object by calling the pd series, construct a function, and we pass the members list to the series. As you can see, the series object is made up of a set of indexes on the left and values on the right. To verify that bricks1 is a panda series, let's pass it to the type function to see what we get. Another way that pandas represents data is as a data frame. A pandas data frame is a heterogeneous two dimensional data structure with labeled rows and columns. We can think of a pandas data frame as a collection of several panda series, all sharing the same index. A data frame is very similar to a spreadsheet or a relational database table. We can create a pandas data frame from a previously created dictionary. Given the members dictionary, we can create a data frame object as follows. Here, we are going to create bricks2, and bricks2 is created by calling the data frame, construct a function, and we passed with the members dictionary. As you can see, pandas converted the dictionary keys to column names, and it used the values for each dictionary key as the cell values in the data frame. To verify that bricks2 is a data frame, let's call the type function to see what it returns. There we have it. It is a data frame. We can also create a data frame from a previously created two dimensional list of values and a list of column names. Given the members and labels lists, we can create a data frame object as follows. This time, we create breaks3, we passed the data frame construct a function, the members list, as well as the labels lists, as the column names. Another way to create a pandas data frame, is by importing data directly from an external source. For example, we can create a data frame by importing a CSV file. So, let us create another that a frame, bricks4. This time, we use a pd.read_csv function, and we pass through it the file we want it to read. There we have it. We can also create a pandas data frame by importing a Microsoft Excel file. This time we're going to call it bricks5, and we will use the read_excel function, and we pass to it the name of the file we intend to read. In this example, we read from an Excel file. Note that for multi-sheet Excel files, the pandas read Excel function imports the first sheet by default. If we want to import a sheet other than the first one, we have to specify a value for the sheet name argument within the read XL function. For example, the bricks Excel file we just imported, has two sheets. The first is named members and the second is named summits. When we imported the file, the function imported the first sheet, which is the member sheet. To import the second sheet, which is the summits sheet, we make the following modification to our code. There we have it. The summits sheet. Besides CSV and Excel files, the pandas package allows us to import other file types, which we do not cover or go over in this tutorial. To get an exhaustive list of supported file types, visit the pandas documentation website.

Contents