Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intro - minor syntax fixes #334

Merged
merged 1 commit into from
Aug 29, 2016
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
intro - minor syntax fixes
the largest change was adding “\” to get the twitter handles to render
properly
  • Loading branch information
seanpwilliams authored and seanpwilliams committed Aug 29, 2016
commit f4ba2d4613c7de4780aa00690b8cc9be8dfbcd7f
15 changes: 7 additions & 8 deletions intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 26,11 @@ The last step of data science is __communication__, an absolutely critical part

Surrounding all these tools is __programming__. Programming is a cross-cutting tool that you use in every part of the project. You don't need to be an expert programmer to be a data scientist, but learning more about programming pays off because becoming a better programmer allows you to automate common tasks, and solve new problems with greater ease.

You'll use these six tools in every data science project, but for most projects they're not enough. There's a rough 80-20 rule at play: you can tackle about 80% of every project using the tools that you'll learn in this book, but you'll need other tools to tackle the remaining 20%. Throughout this book we'll point you to resources where you can learn more.
You'll use these six tools in every data science project, but for most projects they're not enough. There's a rough 80-20 rule at play; you can tackle about 80% of every project using the tools that you'll learn in this book, but you'll need other tools to tackle the remaining 20%. Throughout this book we'll point you to resources where you can learn more.

## The tidyverse

The majority of the packages that you will learn in this book are part of the so-called tidyverse. All packages in the tidyverse share a common philosophy of data and R programming, which makes them fit together naturally. Because they are designed with a unifying vision you should experience fewer problems when you combine multiple packages to solve real problems. The packages in the tidyverse are not perfect, but they fit together well, and over time that fit will continue to improve.
The majority of the packages that you will learn in this book are part of the so-called tidyverse. All packages in the tidyverse share a common philosophy of data and R programming, which makes them fit together naturally. Because they are designed with a unifying vision, you should experience fewer problems when you combine multiple packages to solve real problems. The packages in the tidyverse are not perfect, but they fit together well, and over time that fit will continue to improve.

There are many other excellent packages that are not part of the tidyverse, because they are designed with a different set of underlying principles. This doesn't make them better or worse, just different. In other words, the complement to the tidyverse is not the messyverse, but many other universes of interrelated packages. As you tackle more data science projects with R, you'll learn new packages and new ways of thinking about data. But we hope that the tidyverse will continue to provide a solid foundation no matter how far you go in R.

Expand All @@ -52,8 52,7 @@ The previous description of the tools of data science is organised roughly accor
* Programming tools are not necessarily interesting in their own right,
but do allow you to tackle considerably more challenging problems. We'll
give you a selection of programming tools in the middle of the book, and
then you'll see they can combine with the data science tools to tackle interesting
modelling problems.
then you'll see they can combine with the data science tools to tackle interesting modelling problems.

Within each chapter, we try and stick to a similar pattern: start with some motivating examples so you can see the bigger picture, and then dive into the details. Each section of the book is paired with exercises to help you practice what you've learned. While it's tempting to skip the exercises, there's no better way to learn than practicing on real problems.

Expand Down Expand Up @@ -109,7 108,7 @@ To run the code in this book, you will need to install both R and the RStudio ID

### RStudio

RStudio is an integrated development environment, or IDE, for R programming. When you get started there two key regions in the interface:
RStudio is an integrated development environment, or IDE, for R programming. When you get started, there two key regions in the interface:

```{r echo = FALSE, out.width = "75%"}
knitr::include_graphics("diagrams/rstudio-console.png")
Expand Down Expand Up @@ -159,7 158,7 @@ Throughout the book we use a consistent set of conventions to refer to code:

## Getting help and learning more

This book is not an island: there is no single resource that will allow you to master R. As you start to apply the techniques described in this book to your own data you will soon find questions that I do not answer. This section describes a few tips to help you get help, and to help you keep learning.
This book is not an island; there is no single resource that will allow you to master R. As you start to apply the techniques described in this book to your own data you will soon find questions that I do not answer. This section describes a few tips to help you get help, and to help you keep learning.

If you get stuck, start with Google. Typically adding "R" to a query is enough to restrict it to relevant results: if the search isn't useful, it often means that there aren't any R-specific results available. Google is particularly useful for error messages. If you get an error message and you have no idea what it means, try googling it! Chances are that someone else has been confused by it in the past, and there will be help somewhere on the web. (If the error message isn't in English, run `Sys.setenv(LANGUAGE = "en")` and re-run the code; you're more likely to find help for English error messages.)

Expand All @@ -169,7 168,7 @@ There are three things you need to include to make your example reproducible: re

1. **Packages** should be loaded at the top of the script, so it's easy to
see which ones the example needs. This is a good time to check that you're
using the latest version of each package: it's possible you've discovered
using the latest version of each package; it's possible you've discovered
a bug that's been fixed since you installed the package.

1. The easiest way to include **data** in a question is to use `dput()` to
Expand Down Expand Up @@ -197,7 196,7 @@ There are three things you need to include to make your example reproducible: re

Finish by checking that you have actually made a reproducible example by starting a fresh R session and copying and pasting your script in.

You should also spend some time preparing yourself to solve problems before they occur. Investing a little time in learning R each day will pay off handsomely in the long run. One way to is follow what Hadley, Garrett, and everyone else at RStudio are doing on the [RStudio blog](https://blog.rstudio.org). This is where we post announcements about new packages, new IDE features, and in-person courses. You might also want to follow Hadley ([@hadleywickham](https://twitter.com/hadleywickham)) or Garrett ([@statgarrett](https://twitter.com/statgarrett)) on Twitter, or follow [@rstudiotips](https://twitter.com/rstudiotips) to keep up with new features in the IDE.
You should also spend some time preparing yourself to solve problems before they occur. Investing a little time in learning R each day will pay off handsomely in the long run. One way to is follow what Hadley, Garrett, and everyone else at RStudio are doing on the [RStudio blog](https://blog.rstudio.org). This is where we post announcements about new packages, new IDE features, and in-person courses. You might also want to follow Hadley ([\@hadleywickham](https://twitter.com/hadleywickham)) or Garrett ([\@statgarrett](https://twitter.com/statgarrett)) on Twitter, or follow [\@rstudiotips](https://twitter.com/rstudiotips) to keep up with new features in the IDE.

To keep up with the R community more broadly, we recommend reading <http://www.r-bloggers.com>: it aggregates over 500 blogs about R from around the world. If you're an active Twitter user, follow the `#rstats` hashtag. Twitter is one of the key tools that Hadley uses to keep up with new developments in the community.

Expand Down