This repository contains useful articles and papers for the (aspiring) unicorn data scientist. Unlike other awesome-xyz
repositories, this does not consolidate software tools or libraries; only reading materials.
Pull requests are welcome! See Contributing.
- Rules of ML
- Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department
- Machine Learning: The High-Interest Credit Card of Technical Debt
- Production-Level-Deep-Learning
- End-to-end Machine Learning with TFX on TensorFlow 2.x
- Monitoring Machine Learning Models in Production
- ML engineering best practices
- Google MLOps whitepaper
- Ways I Use Testing as a Data Scientist
- MLOps Principles
- Continuous delivery and automation pipelines in machine learning
- No, you don't need MLOps
- Python dependency management is a dumpster fire
- Detecting Interference: An A/B Test of A/B Tests
- Switchback Tests and Randomized Experimentation Under Network Effects at DoorDash
- Innovating Faster on Personalization Algorithms at Netflix Using Interleaving
- Python Design Patterns
- Awesome Prometheus Alerts
- Refactoring Guru
- Choose boring technology
- The Beginner's Guide to Databases
- System Design 101
- How to do a code review
- Probability Distribution Explorer
- Common probability distributions
- William Chen probability cheat sheet
- KDE visualisation
- Modeling conversion rates and saving millions of dollars using Kaplan-Meier and gamma distributions
- Robust Statistical Distances for Machine Learning
- How Not To Sort By Average Rating
- Common statistical tests are linear models
- Bayesian Optimization
- On Average, You’re Using the Wrong Average: Geometric & Harmonic Means in Data Analysis
- Stop aggregating away the signal in your data
- Inferring Concept Drift Without Labeled Data
- The hacker's guide to uncertainty estimates
- The Illustrated Machine Learning website
- Feature Engineering A-Z
- Applied Machine Learning for Tabular Data
- Interpretable ML Book
- Google Recommendation Systems Crash Course
- Deep Neural Networks for YouTube Recommendations
- Deep density networks and uncertainty in recommender systems
- Microsoft Recommenders
- aman.ai recsys
- Bandit Algorithms
- A Contextual-Bandit Approach to Personalized News Article Recommendation
- Python Causality Handbook
- A Recipe for Training Neural Networks
- Numerically stable and computationally efficient log-sum-exp
- Interpreting loss curves
- Visualizing the Loss Landscape of a Neural Network
- AI Content Generation Tools
- Google research tuning playbook
- The Illustrated Transformer
- Transformer Explainer
- Understanding Large Language Models
- LLM101n - Andrej Kaparthy
- A Visual Guide to Quantization
- Anti-hype LLM reading list
- Understanding LLMs from Scratch Using Middle School Math
- awesome-chatgpt-prompts
- Prompt Engineering Guide
- Anthropic prompt engineering overview
- Prompt Engineering Roadmap
- Building a Generative AI Platform
- Emerging Architectures for LLM Applications
- Open source LLM tools
- rerankers
- ML and LLM system design: 450 case studies to learn from
- Introducing Contextual Retrieval
- Vector Databases Are the Wrong Abstraction
- Practical advice for analysis of large, complex data sets
- What is an analytics engineer?
- The Modern Data Stack
- Emerging architectures for modern data infrastructure
- Highly Opinionated Integrations
- Choosing a Product Analytics Tool
- 2022 ETL Buyer’s Guide: How to Pick the Right Tool for Your Analytics Stack
- Modern Data Stack in a Box with DuckDB
- Coding for Economists
- Simple ML for Sheets
- Data Pipeline Design Patterns
- The Analytics Development Lifecycle
- The Rise of the Declarative Data Stack
- DataHub: Popular metadata architectures explained
- Biases in AI Systems
- How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
- Apple's Human Interface Guidelines for Charts
- Better dashboards align with the scales of business decisions
- Hashicorp manager charter
- Good DS vs. Bad DS
- Open decision making
- North star metrics
- Agile analytics
- Modern data culture stack
- So You Want to Become a Data Science Manager?
- Mochary Method Curriculum
- Gitlab Data Team Handbook
- The Great CEO Within
- Coordination Headwind
- The Art of Onboarding
- 7 Mindsets That Are Slowing Down Your Career Growth
- 3 Critical Skills You Need to Grow Beyond Senior Levels in Engineering
- How to Consistently Hire Remarkable Data Scientists
- How Coursera Competes Against Google and Facebook for the Best Talent
- How to hire smarter than the market: a toy model
- Interviewing is a noisy prediction problem
- How to set compensation using commensense principles
- VP of Engineering hiring cheatsheet
- The Data Science Interview Book
- Conor Dewey's list of data science interview resources
- 101 Data Science Interview Questions, Answers, and Key Concepts
- How I negotiated a $300,000 job offer in Silicon Valley
- How to Make Your Data Science Job Application Stand Out
- You’re probably answering these 5 common interview questions wrong
- 6 red flags I saw while doing 60 technical interviews in 30 days
- Google interview warmup
- Red Flags to Look Out for When Joining a Data Team
- Data Scientist Handbook
- A guide to passing the A/B test interview question in tech companies