forked from IQSS/prefresher
-
Notifications
You must be signed in to change notification settings - Fork 0
/
17_non-wysiwyg.Rmd
301 lines (187 loc) · 14.9 KB
/
17_non-wysiwyg.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
# LaTeX and markdown^[Module originally written by Shiro Kuriwaki] {#nonwysiwyg}
### Where are we? Where are we headed? {-}
Up till now, you should have covered:
* Statistical Programming in `R`
This is only the beginning of `R` -- programming is like learning a language, so learn more as we use it. And yet `R` is of likely not the only programming language you will want to use. While we cannot introduce everything, we'll pick out a few that we think are particularly helpful.
Here will cover
* Markdown
* LaTeX (and BibTeX)
as examples of a non-WYSIWYG editor
and the next chapter (you can read it without reading this LaTeX chapter) covers
* command-line
* git
command-line are a basic set of tools that you may have to use from time to time. It also clarifies what more complicated programs are doing. Markdown is an example of compiling a plain text file. LaTeX is a typesetting program and git is a version control program -- both are useful for non-quantitative work as well.
### Check your understanding {-}
Check if you have an idea of how you might code the following tasks:
* What does "WYSIWYG" stand for? How would a non-WYSIWYG format text?
* How do you start a header in markdown?
* What are some "plain text" editors?
* How do you start a document in `.tex`?
* How do you start a environment in `.tex`?
* How do you insert a figure in `.tex`?
* How do you reference a figure in `.tex`?
* What is a `.bib` file?
* Say you came across a interesting journal article. How would you want to maintain this reference so that you can refer to its citation in all your subsequent papers?
## Motivation
Statistical programming is a fast-moving field. The beta version of `R` was released in 2000, `ggplot2` was released on 2005, and `RStudio` started around 2010. Of course, some programming technologies are quite "old": (`C` in 1969, `C ` around 1989, `TeX` in 1978, `Linux` in 1991, Mac OS in 1984). But it is easy to feel you are falling behind in the recent developments of programming. Today we will do a **brief** and rough overview of some fundamental and new tools other than `R`, with the general aim of having you break out of your comfort zone so you won't be shut out from learning these tools in the future.
## Markdown
Markdown is the text we have been using throughout this course! At its core markdown is just plain text. Plain text does not have any formatting embedded in it. Instead, the formatting is coded up as text. Markdown is _not_ a WYSIWYG (What you see is what you get) text editor like Microsoft Word or Google Docs. This will mean that you need to explicitly code for `bold{text}` rather than hitting Command B and making your text look __bold__ on your own computer.
Markdown is known as a "light-weight" editor, which means that it is relatively easy to write code that will compile. It is quick and easy and satisfies most presentation purposes; you might want to try `LaTeX` for more involved papers.
### markdown commands
For italic and bold, use either the asterisks or the underlines,
```
*italic* **bold**
_italic_ __bold__
```
And for headers use the hash symbols,
```
# Main Header
## Sub-headers
```
### your own markdown
RStudio makes it easy to compile your very first markdown file by giving you templates. Got to `New > R Markdown`, pick a document and click Ok. This will give you a skeleton of a document you can compile -- or "knit".
Rmd is actually a slight modification of real markdown. It is a type of file that R reads and turns into a proper `md` file. Then, it uses a document-conversion called pandoc to compile your `md` into documents like PDF or HTML.
![How Rmds become PDFs or HTMLs](images/RMarkdownFlow.png)
### Quarto
R Markdown (`.Rmd`) files have long been the go-to for reproducible writing workflows for R users.
In 2022, [Posit, PBC](https://posit.co/), who created R Markdown announced a new generation of markdown extensions, with Quarto.
Quarto (`.qmd`) files are a variation on R Markdown which allows for including R, python, Observable, Julia, and more within a document.
Quarto is largely compatible with older `.Rmd` files, just by changing the extension.
As such, you can integrate LaTeX and markdown seamlessly.
Some benefits of using Quarto include:
* [ease of customization with template partials](https://quarto.org/docs/journals/templates.html#template-partials)
* [journal submission templates for many journals](https://quarto.org/docs/extensions/listing-journals.html)
* [dozen of output types](https://quarto.org/docs/reference/)
* [the ability to make websites interacting only with Quarto](https://quarto.org/docs/websites/)
### A note on plain-text editors
Multiple software exist where you can edit plain-text (roughly speaking, text that is not WYSIWYG).
* [RStudio](https://posit.co/products/open-source/rstudio/) (especially for R-related links)
* TeXMaker, TeXShop (especially for TeX)
* [emacs](https://www.gnu.org/software/emacs/), aquamacs (general)
* [vim](http://www.vim.org/download.php) (general)
* [Sublime Text](https://www.sublimetext.com) (general)
Each has their own keyboard shortcuts and special features. You can browse a couple and see which one(s) you like.
Since June 2021, RStudio has offered a visual editor which tries to bridge the gap between plain-text and WYSIWYG.
While writing, it transforms plain markdown, RMarkdown, or Quarto documents into a "WYSISWYM" version, What You See Is What You Mean.
Formatting choices, like bold or italicized text are shown as **bold** or *italicized* text, rather than as intermediate markdown.
Lists, enumerations, and images are shown inline, rather than the code that includes them.
This is not a final form though, as styling still occurs when rendering the final document.
## LaTeX
LaTeX is a typesetting program. You'd engage with LaTeX much like you engage with your `R` code. You will interact with LaTeX in a text editor, and will writing code which will be interpreted by the LaTeX compiler and which will finally be parsed to form your final PDF.
### compile online
1. Go to <https://www.overleaf.com>
2. Scroll down and go to "CREATE A NEW PAPER" if you don't have an account.
3. Let's discuss the default template.
4. Make a new document, and set it as your main document. Then type in the Minimal Working Example (MWE):
```{bash, eval = FALSE}
\documentclass{article}
\begin{document}
Hello World
\end{document}
```
### compile your first LaTeX document locally
LaTeX is a very stable system, and few changes to it have been made since the 1990s. The main benefit: better control over how your papers will look; better methods for writing equations or making tables; overall pleasing aesthetic.
1. Open a plain text editor. Then type in the MWE
```{bash, eval = FALSE}
\documentclass{article}
\begin{document}
Hello World
\end{document}
```
2. Save this as `hello_world.tex`. Make sure you get the file extension right.
3. Open this in your "LaTeX" editor. This can be `TeXMaker`, `Aqumacs`, etc..
4. Go through the click/dropdown interface and click compile.
### main LaTeX commands
LaTeX can cover most of your typesetting needs, to clean equations and intricate diagrams.
Some main commands you'll be using are below, and a very concise cheat sheet here: <https://wch.github.io/latexsheet/latexsheet.pdf>
Most involved features require that you begin a specific "environment" for that feature, clearly demarcating them by the notation `\begin{figure}` and then `\end{figure}`, e.g. in the case of figures.
```
\begin{figure}
\includegraphics{histogram.pdf}
\end{figure}
```
where `histogram.pdf` is a path to one of your files.
Notice that each line starts with a backslash `\` -- in LaTeX this is the symbol to run a command.
The following syntax at the endpoints are shorthand for math equations.
```
\[\int x^2 dx\]
```
these compile math symbols: $\displaystyle \int x^2 dx.$^[Enclosing with `$$` instead of `\[` also has the same effect, so you may see it too. But this is now discouraged due to its inflexibility.]
The `align` environment is useful to align your multi-line math, for example.
```
\begin{align}
P(A \mid B) &= \frac{P(A \cap B)}{P(B)}\\
&= \frac{P(B \mid A)P(A)}{P(B)}
\end{align}
```
\begin{align}
P(A \mid B) &= \frac{P(A \cap B)}{P(B)}\\
&= \frac{P(B \mid A)P(A)}{P(B)}
\end{align}
Regression tables should be outputted as `.tex` files with packages like `xtable` and `stargazer`, and then called into LaTeX by `\input{regression_table.tex}` where `regression_table.tex` is the path to your regression output.
Figures and equations should be labelled with the tag (e.g. `label{tab:regression}` so that you can refer to them later with their tag `Table \ref{tab:regression}`, instead of hard-coding `Table 2`).
For some LaTeX commands you might need to load a separate package that someone else has written. Do this in your preamble (i.e. before `\begin{document}`):
```
\usepackage[options]{package}
```
where `package` is the name of the package and `options` are options specific to the package.
### Further Guides {-}
For a more comprehensive listing of LaTeX commands, Mayya Komisarchik has a great tutorial set of folders: <https://scholar.harvard.edu/mkomisarchik/tutorials-0>
There is a version of LaTeX called Beamer, which is a popular way of making a slideshow. Slides in markdown is also a competitor. The language of Beamer is the same as LaTeX but has some special functions for slides.
## BibTeX
BibTeX is a reference system for bibliographical tests. We have a `.bib` file separately on our computer. This is also a plain text file, but it encodes bibliographical resources with special syntax so that a program can rearrange parts accordingly for different citation systems.
### what is a `.bib` file?
For example, here is the Nunn and Wantchekon article entry in `.bib` form.
```{}
@article{nunn2011slave,
title={The Slave Trade and the Origins of Mistrust in Africa},
author={Nunn, Nathan and Wantchekon, Leonard},
journal={American Economic Review},
volume={101},
number={7},
pages={3221--3252},
year={2011}
}
```
The first entry, `nunn2011slave`, is "pick your favorite" -- pick your own name for your reference system. The other slots in this `@article` entry are entries that refer to specific bibliographical text.
### what does LaTeX do with .bib files?
Now, in LaTeX, if you type
\textcite{nunn2011slave} argue that current variation in the trust among citizens of African countries has historical roots in the European slave trade in the 1600s.
as part of your text, then when the `.tex` file is compiled the PDF shows something like
![](images/biblatex_inline.png)
in whatever citation style (APSA, APA, Chicago) you pre-specified!
Also at the end of your paper you will have a bibliography with entries ordered and formatted in the appropriate citation.
![](images/biblatex_bibliography.png)
This is a much less frustrating way of keeping track of your references -- no need to hand-edit formatting the bibliography to conform to citation rules (which biblatex already knows) and no need to update your bibliography as you add and drop references (biblatex will only show entries that are used in the main text).
### stocking up on your .bib files
You should keep your own `.bib` file that has all your bibliographical resources. Storing entries is cheap (does not take much memory), so it is fine to keep all your references in one place (but you'll want to make a new one for collaborative projects where multiple people will compile a `.tex` file).
For example, Gary's BibTeX file is here: <https://github.com/iqss-research/gkbibtex/blob/master/gk.bib>
Citation management software (Mendeley or Zotero) automatically generates .bib entries from your library of PDFs for you, provided you have the bibliography attributes right.
## Exercise {-}
Create a LaTeX document for a hypothetical research paper on your laptop and, once you've verified it compiles into a PDF, come show it to either one of the instructors.
You can also use overleaf if you have preference for a cloud-based system. But don't swallow the built-in templates without understanding or testing them.
Each student will have slightly different substantive interests, so we won't impose much of a standard. But at a minimum, the LaTeX document should have:
* A title, author, date, and abstract
* Sections
* Italics and boldface
* A figure with a caption and in-text reference to it.
Depending on your subfield or interests, try to implement some of the following:
* A bibliographical reference drawing from a separate `.bib` file
* A table
* A math expression
* A different font
* Different page margins
* Different line spacing
## Concluding the Prefresher {-}
Math may not be the perfect tool for every aspiring political scientist, but hopefully it was useful background to have at the least:
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Historians think this totally meaningless and nonsensical statistic is the product of an early-modern epistemological shift in which numbers and quantifiable data became revered above other kinds of knowledge as the most useful and credible form of truth <a href="https://t.co/wVFyAQGxEv">https://t.co/wVFyAQGxEv</a></p>— Gina Anne Tam 譚吉娜 (@DGTam86) <a href="https://twitter.com/DGTam86/status/1001279033638727680?ref_src=twsrc^tfw">May 29, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
But we should be aware that too much slant towards math and programming can miss the point:
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">To be clear, PhD training in Econ (first year) is often a disaster-- like how to prove the Central Limit Theorem (the LeBron James of Statistics) with polar-cooardinates. This is mostly a way to demoralize actual economists and select a bunch of unimaginative math jocks.</p>— Amitabh Chandra (@amitabhchandra2) <a href="https://twitter.com/amitabhchandra2/status/1029359117226438657?ref_src=twsrc^tfw">August 14, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
Keep on learning, trying new techniques to improve your work, and learn from others!
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">What <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc^tfw">#rstats</a> tricks did it take you way too long to learn? One of mine is using readRDS and saveRDS instead of repeatedly loading from CSV</p>— Emily Riederer (@EmilyRiederer) <a href="https://twitter.com/EmilyRiederer/status/898735640031920129?ref_src=twsrc^tfw">August 19, 2017</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
### Your Feedback Matters {-}
_Please tell us how we can improve the Prefresher_: The Prefresher is a work in progress, with material mainly driven by graduate students. Please tell us how we should change (or not change) each of its elements:
https://harvard.az1.qualtrics.com/jfe/form/SV_esbzN8ZFAOPTqiV