Skip to content

Commit

Permalink
fix list of typos (hadley#488)
Browse files Browse the repository at this point in the history
  • Loading branch information
yahwes authored and hadley committed Oct 24, 2016
1 parent 4bb10b9 commit c81d1e0
Show file tree
Hide file tree
Showing 12 changed files with 31 additions and 31 deletions.
2 changes: 1 addition & 1 deletion communicate-plots.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 338,7 @@ ggplot(mpg, aes(displ, hwy))

Instead of just tweaking the details a little, you can instead replace the scale altogether. There are two types of scales you're mostly likely to want to switch out: continuous position scales and colour scales. Fortunately, the same principles apply to all the other aesthetics, so once you've mastered position and colour, you'll be able to quickly pick up other scale replacements.

It's very useful to plot transformations of your variable. For example, as we've seen in [diamond prices][diamond-prices] it's easier to see the precise relationship between `carat` and `price` if we log transform them:
It's very useful to plot transformations of your variable. For example, as we've seen in [diamond prices](diamond-prices) it's easier to see the precise relationship between `carat` and `price` if we log transform them:

```{r, fig.align = "default", out.width = "50%"}
ggplot(diamonds, aes(carat, price))
Expand Down
4 changes: 2 additions & 2 deletions datetimes.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 182,7 @@ Now that you know how to get date-time data into R's date-time data structures,
### Getting components
You can pull out individual parts of the date with the acccessor functions `year()`, `month()`, `mday()` (day of the month), `yday()` (day of the year), `wday()` (day of the week), `hour()`, `minute()`, and `second()`.
You can pull out individual parts of the date with the accessor functions `year()`, `month()`, `mday()` (day of the month), `yday()` (day of the year), `wday()` (day of the week), `hour()`, `minute()`, and `second()`.
```{r}
datetime <- ymd_hms("2016-07-08 12:34:56")
Expand Down Expand Up @@ -477,7 477,7 @@ To find out how many periods fall into an interval, you need to use integer divi

How do you pick between duration, periods, and intervals? As always, pick the simplest data structure that solves your problem. If you only care about physical time, use a duration; if you need to add human times, use a period; if you need to figure out how long a span is in human units, use an interval.

Figure \@(ref:dt-algebra) summarises permitted arithmetic operations between the different data types.
Figure \@ref(fig:dt-algebra) summarises permitted arithmetic operations between the different data types.

```{r dt-algebra, echo = FALSE, fig.cap = "The allowed arithmetic operations between pairs of date/time classes."}
knitr::include_graphics("diagrams/datetimes-arithmetic.png")
Expand Down
2 changes: 1 addition & 1 deletion iteration.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -914,7 914,7 @@ x %>%

### Reduce and accumulate

Sometimes you have a complex list that you want to reduce to a simple list by repeatedly applying a function that reduces a pair to a singleton. This useful if you want to apply a two-table dplyr verb to multiple tables. For example, you might have a list of data frames, and you want to reduce to a single data frame by joining the elements together:
Sometimes you have a complex list that you want to reduce to a simple list by repeatedly applying a function that reduces a pair to a singleton. This is useful if you want to apply a two-table dplyr verb to multiple tables. For example, you might have a list of data frames, and you want to reduce to a single data frame by joining the elements together:

```{r}
dfs <- list(
Expand Down
6 changes: 3 additions & 3 deletions model-basics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 192,7 @@ sim1_mod <- lm(y ~ x, data = sim1)
coef(sim1_mod)
```

These are exactly the same values we got with `optim()`! Behind the scenes `lm()` doesn't use `optim()` but instead takes advantage of the mathematical structure of linear models. Using some connections between geometry, calculus, and linear algebra, `lm()` actually finds the closest model by in a single step, using a sophisticated algorithm. This approach is both faster, and guarantees that there is a global minimum.
These are exactly the same values we got with `optim()`! Behind the scenes `lm()` doesn't use `optim()` but instead takes advantage of the mathematical structure of linear models. Using some connections between geometry, calculus, and linear algebra, `lm()` actually finds the closest model in a single step, using a sophisticated algorithm. This approach is both faster, and guarantees that there is a global minimum.

### Exercises

Expand Down Expand Up @@ -488,7 488,7 @@ Note my use of `seq_range()` inside `data_grid()`. Instead of using every unique
```
* `trim = 0.1` will trim off 10% of the tail values. This is useful if the
variables has an long tailed distribution and you want to focus on generating
variables have a long tailed distribution and you want to focus on generating
values near the center:
```{r}
Expand Down Expand Up @@ -552,7 552,7 @@ model_matrix(df, y ~ x^2 x)
model_matrix(df, y ~ I(x^2) x)
```

Transformations are useful because you can use them to approximate non-linear functions. If you've taken a calculus class, you may have heard of Taylor's theorem which says you can approximate any smooth function with an infinite sum of polynomials. That means you can use a linear to get arbitrary close to a smooth function by fitting an equation like `y = a_1 a_2 * x a_3 * x^2 a_4 * x ^ 3`. Typing that sequence by hand is tedious, so R provides a helper function: `poly()`:
Transformations are useful because you can use them to approximate non-linear functions. If you've taken a calculus class, you may have heard of Taylor's theorem which says you can approximate any smooth function with an infinite sum of polynomials. That means you can use a polynomial function to get arbitrarily close to a smooth function by fitting an equation like `y = a_1 a_2 * x a_3 * x^2 a_4 * x ^ 3`. Typing that sequence by hand is tedious, so R provides a helper function: `poly()`:

```{r}
model_matrix(df, y ~ poly(x, 2))
Expand Down
4 changes: 2 additions & 2 deletions model-building.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 154,7 @@ diamonds2 %>%
arrange(price)
```

Nothing really jumps out at me here, but it's probably worth spending time considering if this indicates a problem with our model, or if there are a errors in the data. If there are mistakes in the data, this could be an opportunity to buy diamonds that have been priced low incorrectly.
Nothing really jumps out at me here, but it's probably worth spending time considering if this indicates a problem with our model, or if there are errors in the data. If there are mistakes in the data, this could be an opportunity to buy diamonds that have been priced low incorrectly.

### Exercises

Expand Down Expand Up @@ -385,7 385,7 @@ Either approach is reasonable. Making the transformed variable explicit is usefu

### Time of year: an alternative approach

In the previous section we used our domain knowledge (how the US school term affects travel) to improve the model. An alternative to using making our knowledge explicit in the model is to give the data more room to speak. We could use a more flexible model and allow that to capture the pattern we're interested in. A simple linear trend isn't adequate, so we could try using a natural spline to fit a smooth curve across the year:
In the previous section we used our domain knowledge (how the US school term affects travel) to improve the model. An alternative to using our knowledge explicitly in the model is to give the data more room to speak. We could use a more flexible model and allow that to capture the pattern we're interested in. A simple linear trend isn't adequate, so we could try using a natural spline to fit a smooth curve across the year:

```{r}
library(splines)
Expand Down
6 changes: 3 additions & 3 deletions model-many.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 13,7 @@ In this chapter you're going to learn three powerful ideas that help you to work
1. Using the __broom__ package, by David Robinson, to turn models into tidy
data. This is a powerful technique for working with large numbers of models
because once you have tidy data, you can apply all of the techniques that
you've learned about in earlier in the book.
you've learned about earlier in the book.

We'll start by diving into a motivating example using data about life expectancy around the world. It's a small dataset but it illustrates how important modelling can be for improving your visualisations. We'll use a large number of simple models to partition out some of the strongest signal so we can see the subtler signals that remain. We'll also see how model summaries can help us pick out outliers and unusual trends.

Expand Down Expand Up @@ -133,7 133,7 @@ And we want to apply it to every data frame. The data frames are in a list, so w
models <- map(by_country$data, country_model)
```

However, rather than leaving leaving the list of models as a free-floating object, I think it's better to store it as a column in the `by_country` data frame. Storing related objects in columns is a key part of the value of data frames, and why I think list-columns are such a good idea. In the course of working with these countries, we are going to have lots of lists where we have one element per country. So why not store them all together in one data frame?
However, rather than leaving the list of models as a free-floating object, I think it's better to store it as a column in the `by_country` data frame. Storing related objects in columns is a key part of the value of data frames, and why I think list-columns are such a good idea. In the course of working with these countries, we are going to have lots of lists where we have one element per country. So why not store them all together in one data frame?

In other words, instead of creating a new object in the global environment, we're going to create a new variable in the `by_country` data frame. That's a job for `dplyr::mutate()`:

Expand Down Expand Up @@ -194,7 194,7 @@ resids %>%
facet_wrap(~continent)
```

It looks like we've missed some mild pattern. There's also something interesting going on in Africa: we see some very large residuals which suggests our model isn't fitting so well there. We'll explore that more in the next section, attacking it from a slightly different angle.
It looks like we've missed some mild patterns. There's also something interesting going on in Africa: we see some very large residuals which suggests our model isn't fitting so well there. We'll explore that more in the next section, attacking it from a slightly different angle.

### Model quality

Expand Down
2 changes: 1 addition & 1 deletion relational-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 216,7 @@ y <- tribble(
)
```

The coloured column represents the "key" variable: these are used to match the rows between the tables. The grey column represents the "value" column that is carried along for the ride. In these examples I'll show a single key variable and single value variable, but idea generalises in a straightforward way to multiple keys and multiple values.
The coloured column represents the "key" variable: these are used to match the rows between the tables. The grey column represents the "value" column that is carried along for the ride. In these examples I'll show a single key variable, but the idea generalises in a straightforward way to multiple keys and multiple values.

A join is a way of connecting each row in `x` to zero, one, or more rows in `y`. The following diagram shows each potential match as an intersection of a pair of lines.

Expand Down
6 changes: 3 additions & 3 deletions rmarkdown-formats.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 4,7 @@

So far you've seen R Markdown used to produce HTML documents. This chapter gives a brief overview of some of the many other types of output you can produce with R Markdown. There are two ways to set the output of a document:

1. Permanently, by modifying the the YAML header:
1. Permanently, by modifying the YAML header:

```yaml
title: "Viridis Demo"
Expand Down Expand Up @@ -88,7 88,7 @@ output:

## Notebooks

A notebook, `html_notebook`, is a variation on a `html_document`. The rendered outputs are very similar, but the purpose is different. A `html_document` is focussed on communicating with decisions makers, while a notebook is focussed on collaborating with other data scientists. These different purposes lead to using the HTML output in different ways. Both HTML outputs will contain the fully rendered output, but the notebook also contains the full source code. That means you can use the `.nb.html` generated by the notebook in two ways:
A notebook, `html_notebook`, is a variation on a `html_document`. The rendered outputs are very similar, but the purpose is different. A `html_document` is focussed on communicating with decision makers, while a notebook is focussed on collaborating with other data scientists. These different purposes lead to using the HTML output in different ways. Both HTML outputs will contain the fully rendered output, but the notebook also contains the full source code. That means you can use the `.nb.html` generated by the notebook in two ways:

1. You can view it in a web browser, and see the rendered output. Unlike
`html_document`, this rendering always includes an embedded copy of
Expand Down Expand Up @@ -238,7 238,7 @@ Other packages provide even more output formats:
* The __bookdown__ package, <https://github.com/rstudio/bookdown>,
makes it easy to write books, like this one. To learn more, read
[_Authoring Books with R Markdown_](https://bookdown.org/yihui/bookdown/),
by Yihui Xie, which is, of course, written in bookdown, Visit
by Yihui Xie, which is, of course, written in bookdown. Visit
<http://www.bookdown.org> to see other bookdown books written by the
wider R community.

Expand Down
16 changes: 8 additions & 8 deletions rmarkdown.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 70,7 @@ knitr::include_graphics("images/RMarkdownFlow.png")

To get started with your own `.Rmd` file, select *File > New File > R Markdown...* in the menubar. RStudio will launch a wizard that you can use to pre-populate your file with useful content that reminds you how the key features of R Markdown work.

The following sections dives into the three components of an R Markdown document in more details: the markdown text, the code chunks, and the YAML header.
The following sections dive into the three components of an R Markdown document in more details: the markdown text, the code chunks, and the YAML header.

### Exercises

Expand Down Expand Up @@ -187,7 187,7 @@ The most important set of options controls if your code block is executed and wh
of your report, but can be very useful if you need to debug exactly
what is going on inside your `.Rmd`. It's also useful if you're teaching R
and want to deliberately include an error. The default, `error = FALSE` causes
knitting to failure if there is a single error in the document.
knitting to fail if there is a single error in the document.
The following table summarises which types of output each option supressess:
Expand Down Expand Up @@ -220,7 220,7 @@ knitr::kable(

Read the documentation for `?knitr::kable` to see the other ways in which you can customise the table. For even deeper customisation, consider the __xtable__, __stargazer__, __pander__, __tables__, and __ascii__ packages. Each provides a set of tools for returning formatted tables from R code.

There are also a rich set of options for controlling how figures embedded. You'll learn about these in [saving your plots].
There is also a rich set of options for controlling how figures are embedded. You'll learn about these in [saving your plots].

### Caching

Expand Down Expand Up @@ -260,7 260,7 @@ I've used the advice of [David Robinson](https://twitter.com/drob/status/7387866

### Global options

As you work more with knitr, you will discover that some of the default chunk options don't fit your needs and you want to change them. You can do by calling `knitr::opts_chunk$set()` in a code chunk. For example, when writing books and tutorials I set:
As you work more with knitr, you will discover that some of the default chunk options don't fit your needs and you want to change them. You can do this by calling `knitr::opts_chunk$set()` in a code chunk. For example, when writing books and tutorials I set:

```{r, eval = FALSE}
knitr::opts_chunk$set(
Expand Down Expand Up @@ -360,7 360,7 @@ Alternatively, if you need to produce many such paramterised reports, you can ca
rmarkdown::render("fuel-economy.Rmd", params = list(my_class = "suv"))
```

This is particularly powerful in conjunction with `purrr:pwalk()`. The following example creates a report for each value of `class` found in `mpg`. First we create a data frame that has one row for each class, giving the `filename` of report and the `params` it should be given:
This is particularly powerful in conjunction with `purrr:pwalk()`. The following example creates a report for each value of `class` found in `mpg`. First we create a data frame that has one row for each class, giving the `filename` of the report and the `params`:

```{r}
reports <- tibble(
Expand All @@ -371,7 371,7 @@ reports <- tibble(
reports
```

Then we match the column names to the argument names of `render()`, and use purrr's **parallel* walk to call `render()` once for each row:
Then we match the column names to the argument names of `render()`, and use purrr's **parallel** walk to call `render()` once for each row:

```{r, eval = FALSE}
reports %>%
Expand Down Expand Up @@ -406,7 406,7 @@ Smith says blah [-@smith04].

When R Markdown renders your file, it will build and append a bibliography to the end of your document. The bibliography will contain each of the cited references from your bibliography file, but it will not contain a section heading. As a result it is common practice to end your file with a section header for the bibliography, such as `# References` or `# Bibliography`.

You can change the style of your citations and bibliography by reference a CSL (citation style language) file to the `csl` field:
You can change the style of your citations and bibliography by referencing a CSL (citation style language) file in the `csl` field:

```yaml
bibliography: rmarkdown.bib
Expand All @@ -428,5 428,5 @@ There are two important topics that we haven't covered here: collaboration, and
1. The "Git and GitHub" chapter of _R Packages_, by Hadley. You can also
read it for free online: <http://r-pkgs.had.co.nz/git.html>.
I have also not touched about what you should actually write in order to clearly communicate the results of your analysis. To improve your writing, I highly recommend reading either [_Style: Lessons in Clarity and Grace_](https://amzn.com/0134080416) by Joseph M. Williams & Joseph Bizup, or [_The Sense of Structure: Writing from the Reader's Perspective_](https://amzn.com/0205296327) by George Gopen. Both books will help you understand the structure of sentences and paragraphs, and give you the tools to make your writing more clear. (These books are rather expensive if purchased new, but they're used by many English classes so there are plenty of cheap second-hand copies). George Gopen also has a number of short articles on writing at <http://georgegopen.com/articles/litigation/>. They are aimed at lawyers, but almost everything applies to data scientists too.
I have also not touched on what you should actually write in order to clearly communicate the results of your analysis. To improve your writing, I highly recommend reading either [_Style: Lessons in Clarity and Grace_](https://amzn.com/0134080416) by Joseph M. Williams & Joseph Bizup, or [_The Sense of Structure: Writing from the Reader's Perspective_](https://amzn.com/0205296327) by George Gopen. Both books will help you understand the structure of sentences and paragraphs, and give you the tools to make your writing more clear. (These books are rather expensive if purchased new, but they're used by many English classes so there are plenty of cheap second-hand copies). George Gopen also has a number of short articles on writing at <http://georgegopen.com/articles/litigation/>. They are aimed at lawyers, but almost everything applies to data scientists too.
Loading

0 comments on commit c81d1e0

Please sign in to comment.