Skip to content
This repository has been archived by the owner on Jan 9, 2023. It is now read-only.

marco-morales/QMSS-GR5072_Spring2022

Repository files navigation

QMSS GR5072 - MODERN DATA STRUCTURES


Instructor:         Marco Morales, Columbia University

TAs:                   Yingzhi Zhang, Columbia University
                           Parth Gupta, Columbia University

This repository is a companion to the course Modern Data Structures taught at the Quantitative Methods in the Social Sciences program over the Spring of 2022. It contains curated reference materials, slides and sample code. You can find the most updated version of the course syllabus here. Make sure to check regularly for updates.

The course will primarily use the R language for instruction. For that reason, some familiarity with R — in particular with regards the logic of object-oriented programming languages and base functions — is assumed. Knowledge of specific packages and other software tools will be built throughout the course.

Resources

  • Textbooks: While there are no required textbooks for this course, you will find these (online) books to be very useful in addition to the lectures and course readings:

    • Hadley Wickham & Garret Grolemund. R for Data Science. O’Reilly Media, Boston, MA, 2016
    • Hadley Wickham. Advanced R (Second Edition). Taylor & Francis Group, Boca Raton, FL, 2019
    • Hadley Wickham & Jenny Bryan. R Packages (Second Edition). O’Reilly Media, Boston, MA, 2022
    • Collin Gillespie & Robin Lovelace. Efficient R Programming. O’Reilly Media, Boston, MA, 2016
  • Software: The course will rely heavily on R, RStudio, and git. Please install them before our first class.

  • Cloud Services: Homeworks will be submitted through the GitHub classroom for this course. Sign up for a GitHub account if you don't already have one. AWS Educate and Databricks Community classrooms will be available to train you to leverage data at scale.

Course Roadmap

outline\
| -- week  1 : Introduction to R
| -- week  2 : git, GitHub and R Markdown
| -- week  3 : the tidyverse
| -- week  4 : Functions I: their structure and logic
| -- week  5 : Functions II: nested and complex operations
| -- week  6 : Functions III: write your own R package
| -- week  7 : Functions IV: strings and dates
| -- week  8 : -- ACADEMIC HOLIDAY --
| -- week  9 : working with APIs
| -- week 10 : working with JSON & XML | web scraping
| -- week 11 : best practices
| -- week 12 : working with SQL
| -- week 13 : distributed data processing in the cloud
| -- week 14 : review session
| -- week 15 : final exam

Accessing course materials in this repo

  1. install git in your local machine

  2. from the command line, go to the directory where you want to clone this repo

    $ cd <your chosen directory>
    
  3. clone this repository to get a local copy in your machine

    $ git clone https://github.com/marco-morales/QMSS-GR5072_Spring2022.git
    
  4. pull every week before class to sync your local copy with the latest changes pushed to the repo

    $ git pull origin main
    
  5. "Watch" the repository to get notifications each time updates are pushed

Acknowledgements: Materials in this repository derive from previous iterations of this course taught by Mike Parrot, Thomas Brambor and Greg Eirich.