经典论文阅读笔记,文章同步发布在知乎和博客上。欢迎提 PR。论文列表如下。
- Discretized Streams: Fault-Tolerant Streaming Computation at Scale: 笔记原文,知乎
- Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark : 笔记原文,知乎
- The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing:笔记原文,知乎
- Distributed Snapshots: Determining Global States of a Distributed System: 笔记原文
- MapReduce: Simplified Data Processing on Large Clusters
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- S4: Distributed Stream Computing Platform
- The Chubby lock service for loosely-coupled distributed systems
- In Search of an Understandable Consensus Algorithm (Extended Version)
- The Google File System
- Bigtable: A Distributed Storage System for Structured Data
- Dynamo: Amazon’s Highly Available Key-value Store
- Finding a Needle in Haystack: Facebook's Photo Storage
- Spanner: Google's Globally-Distributed Database
- F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
- Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
- Omega: flexible, scalable schedulers for large compute clusters
- Large-scale cluster management at Google with Borg
- Borg: the Next Generation
- F1 Query: Declarative Querying at Scale
- Apache Druid
- Autopilot: workload autoscaling at Google