forked from hadley/r4ds
-
Notifications
You must be signed in to change notification settings - Fork 0
/
program.Rmd
53 lines (37 loc) · 4.6 KB
/
program.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# (PART) Program {-}
# Introduction
Code is a tool of communication, not just to the computer, but to other people. This is important because every project you undertake is fundamentally collaborative. Even if you're not working with other people, you'll definitely be working with future-you. You want to write clear code so that future-you doesn't curse present-you when you look at a project again after several months have passed.
To me, improving your communication skills is a key part of mastering R as a programming language. Over time, you want your code to become more and more clear, and easier to write. Removing duplication is an important part of expressing yourself clearly because it lets the reader (i.e. future-you!) focus on what's different between operations rather than what's the same. The goal is not just to write better functions or to do things that you couldn't do before, but to code with more "ease". As you internalise the ideas in this chapter, you should find it easier to re-tackle problems that you've struggled to solve in the past.
In the following chapters, you'll learn important programming skills:
1. We'll start by diving deep into the __pipe__, `%>%`, talking more about how
it works, what the alternatives are, and when not to use the pipe.
1. Copy-and-paste is a powerful tool, but you should avoid doing it more than
twice. Repeating yourself in code is dangerous because it can easily lead
to errors and inconsistencies. Instead, write __functions__ which let
you extract out repeated code so that it can be easily reused.
1. Functions extract out repeated code, but you often need to repeat the
same actions on multiple inputs. You need tools for __iteration__ that
let you do similar things again and again. These tools include for loops
and functional programming.
1. As you start to write more powerful functions, you'll need a solid
grounding in R's data structures. You must master the four common atomic
vectors, the three important S3 classes built on top of them, and
understand the mysteries of the list and data frame.
1. One of the particularly important data structures in R is the list.
Lists are important because a list can contain other lists, so it is
__hierarchical__. Two common scenarios where hierarchical structures
arise are json, and fitting many models. You'll need to learn some new
tools from the purrr package to make handle these cases as easily as
possible.
The goal of these chapters is to teach you the minimum about programming that a practicising data scientist must know. It turns out this is a reasonable amount, and I think it's worth investing in your programming skills. It's an investment that won't pay off immediately, but over time it will allow you to solve new problems more quickly, and reuse your insights from previous problems in new scenarios.
Writing code is similar in many ways to writing prose. One parallel which I find particularly useful is that in both cases rewriting is the key to clarity. The first expression of your ideas is unlikely to be particularly clear, and you may need to rewrite multiple times. After solving a data analysis challenge, it's often worth looking at your code and thinking about whether or not it's obvious what you've done. If you spend a little time rewriting your code while the ideas are fresh, you can save a lot of time later trying to recreate what your code did. But this doesn't mean you should rewrite every function: you need to balance what you need to achieve now with saving time in the long run. (But the more you rewrite your functions the more likely you'll first attempt will be clear.)
## Learning more
As you become a better R programmer, you'll learn more techniques for reducing various types of duplication. This allows you to do more with less, and allows you to express yourself more clearly by taking advantage of powerful programming constructs.
To learn more you need to study R as a programming language, not just an interactive environment for data science. We have written two books that will help you do so:
* [Hands on Programming with R](http://shop.oreilly.com/product/0636920028574.do),
by Garrett Grolemund. This is an introduction to R as a programming language
and is a great place to start if R is your first programming language.
* [Advanced R](http://adv-r.had.co.nz) by Hadley Wickham. This dives into the
details of R the programming language. This is a great place to start if
you've programmed in other languages and you want to learn what makes R
special, different, and particularly well suited to data analysis.