Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/data-transform #1398

Merged
merged 5 commits into from
Apr 10, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions data-transform.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 58,7 @@ glimpse(flights)
```

In both views, the variables names are followed by abbreviations that tell you the type of each variable: `<int>` is short for integer, `<dbl>` is short for double (aka real numbers), `<chr>` for character (aka strings), and `<dttm>` for date-time.
These are important because the operations you can perform on a column depend so much on its "type", and these types are used to organize the chapters in the next section of the book.
These are important because the operations you can perform on a column depend so much on its "type".

### dplyr basics

Expand Down Expand Up @@ -102,7 102,7 @@ We'll also discuss `distinct()` which finds rows with unique values but unlike `
`filter()` allows you to keep rows based on the values of the columns[^data-transform-1].
The first argument is the data frame.
The second and subsequent arguments are the conditions that must be true to keep the row.
For example, we could find all flights that arrived more than 120 minutes (two hours) late:
For example, we could find all flights that departed more than 120 minutes (two hours) late:

[^data-transform-1]: Later, you'll learn about the `slice_*()` family which allows you to choose rows based on their positions.

Expand Down Expand Up @@ -225,7 225,7 @@ flights |>

### Exercises

1. In a single pipeline, find all flights that meet all of the following conditions:
1. In a single pipeline, find all flights that meet each of the following conditions:

- Had an arrival delay of two or more hours
- Flew to Houston (`IAH` or `HOU`)
Expand All @@ -251,7 251,7 @@ flights |>

## Columns

There are four important verbs that affect the columns without changing the rows: `mutate()` creates new columns that are derived from the existing columns, `select()` changes which columns are present; `rename()` changes the names of the columns; and `relocate()` changes the positions of the columns.
There are four important verbs that affect the columns without changing the rows: `mutate()` creates new columns that are derived from the existing columns, `select()` changes which columns are present, `rename()` changes the names of the columns, and `relocate()` changes the positions of the columns.

### `mutate()` {#sec-mutate}

Expand Down Expand Up @@ -479,7 479,7 @@ flights |>
arrange(desc(speed))
```

Even though this pipeline has four steps, it's easy to skim because the verbs come at the start of each line: start with the `flights` data, then filter, then group, then summarize.
Even though this pipeline has four steps, it's easy to skim because the verbs come at the start of each line: start with the `flights` data, then filter, then mutate, then select, then arrange.

What would happen if we didn't have the pipe?
We could nest each function call inside the previous call:
Expand Down Expand Up @@ -575,7 575,7 @@ This means subsequent operations will now work "by month".
### `summarize()` {#sec-summarize}

The most important grouped operation is a summary, which, if being used to calculate a single summary statistic, reduces the data frame to have a single row for each group.
In dplyr, this is operation is performed by `summarize()`[^data-transform-3], as shown by the following example, which computes the average departure delay by month:
In dplyr, this operation is performed by `summarize()`[^data-transform-3], as shown by the following example, which computes the average departure delay by month:

[^data-transform-3]: Or `summarise()`, if you prefer British English.

Expand Down