Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/data-transform #1398

Merged
merged 5 commits into from
Apr 10, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions data-transform.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 58,7 @@ glimpse(flights)
```

In both views, the variables names are followed by abbreviations that tell you the type of each variable: `<int>` is short for integer, `<dbl>` is short for double (aka real numbers), `<chr>` for character (aka strings), and `<dttm>` for date-time.
These are important because the operations you can perform on a column depend so much on its "type", and these types are used to organize the chapters in the next section of the book.
These are important because the operations you can perform on a column depend so much on its "type", and these types are used to organize the chapters in the Transform part of the book.
mine-cetinkaya-rundel marked this conversation as resolved.
Show resolved Hide resolved

### dplyr basics

Expand Down Expand Up @@ -102,7 102,7 @@ We'll also discuss `distinct()` which finds rows with unique values but unlike `
`filter()` allows you to keep rows based on the values of the columns[^data-transform-1].
The first argument is the data frame.
The second and subsequent arguments are the conditions that must be true to keep the row.
For example, we could find all flights that arrived more than 120 minutes (two hours) late:
For example, we could find all flights that departed more than 120 minutes (two hours) late:

[^data-transform-1]: Later, you'll learn about the `slice_*()` family which allows you to choose rows based on their positions.

Expand Down Expand Up @@ -166,7 166,7 @@ flights |>
```

This "works", in the sense that it doesn't throw an error, but it doesn't do what you want because `|` first checks the condition `month == 1` and then checks the condition `2`, which is not a sensible condition to check.
We'll learn more about what's happening here and why in @sec-boolean-operations.
We'll learn more about what's happening here and why in @sec-order-of-operations-logicals.
mine-cetinkaya-rundel marked this conversation as resolved.
Show resolved Hide resolved

### `arrange()`

Expand Down Expand Up @@ -225,7 225,7 @@ flights |>

### Exercises

1. In a single pipeline, find all flights that meet all of the following conditions:
1. In a single pipeline, find all flights that meet each of the following conditions:

- Had an arrival delay of two or more hours
- Flew to Houston (`IAH` or `HOU`)
Expand All @@ -251,7 251,7 @@ flights |>

## Columns

There are four important verbs that affect the columns without changing the rows: `mutate()` creates new columns that are derived from the existing columns, `select()` changes which columns are present; `rename()` changes the names of the columns; and `relocate()` changes the positions of the columns.
There are four important verbs that affect the columns without changing the rows: `mutate()` creates new columns that are derived from the existing columns; `select()` changes which columns are present; `rename()` changes the names of the columns; and `relocate()` changes the positions of the columns.
mine-cetinkaya-rundel marked this conversation as resolved.
Show resolved Hide resolved

### `mutate()` {#sec-mutate}

Expand Down Expand Up @@ -479,7 479,7 @@ flights |>
arrange(desc(speed))
```

Even though this pipeline has four steps, it's easy to skim because the verbs come at the start of each line: start with the `flights` data, then filter, then group, then summarize.
Even though this pipeline has four steps, it's easy to skim because the verbs come at the start of each line: start with the `flights` data, then filter, then mutate, then select, then arrange.

What would happen if we didn't have the pipe?
We could nest each function call inside the previous call:
Expand Down Expand Up @@ -575,7 575,7 @@ This means subsequent operations will now work "by month".
### `summarize()` {#sec-summarize}

The most important grouped operation is a summary, which, if being used to calculate a single summary statistic, reduces the data frame to have a single row for each group.
In dplyr, this is operation is performed by `summarize()`[^data-transform-3], as shown by the following example, which computes the average departure delay by month:
In dplyr, this operation is performed by `summarize()`[^data-transform-3], as shown by the following example, which computes the average departure delay by month:

[^data-transform-3]: Or `summarise()`, if you prefer British English.

Expand Down
2 changes: 1 addition & 1 deletion logicals.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 247,7 @@ A missing value in a logical vector means that the value could either be `TRUE`
`TRUE | TRUE` and `FALSE | TRUE` are both `TRUE`, so `NA | TRUE` must also be `TRUE`.
Similar reasoning applies with `NA & FALSE`.

### Order of operations
### Order of operations {#sec-order-of-operations-logicals}
mine-cetinkaya-rundel marked this conversation as resolved.
Show resolved Hide resolved

Note that the order of operations doesn't work like English.
Take the following code that finds all flights that departed in November or December:
Expand Down