Articles and talks that I referred to while working on the taxi data project. Published here as a note to myself.
- FLIP-23. The document also discusses implementing model training as well as model serving. Two linked documents—“Flink ML Roadmap” and “Flink-MS”—are also worth reading.
- Boris Lublinsky’s book “Serving Machine Learning Models”, and talk.
- Please Stop Calling Database Systems AP or CP.
- Kate Matsudaira on distributed systems
- Distributed Systems for Fun and Profit
- Martin Kleppmann’s book and interview.
- Use of Formal Methods at Amazon Web Services
- A tale of two clusters: Mesos and YARN
- A visual explanation of Raft: link
The Dataflow Model, Apache Beam, and Cloud Dataflow
- Tyler Akidau’s article on the Dataflow model.
- Frances Perry’s talk covers the same topics as the aforementioned article by Tyler Akidau.
- Big Data Processing at Spotify
- Hands-on exercises to try out the Dataflow model created by dataArtisans.
- Jay Kreps’ article “Questioning the Lambda Architecture”.
- Martin Kleppmann compares Samza to other stream processing systems.
- Flink Blog: articles that discuss how exactly-once processing, checkpointing, joins, etc. are implemented in Flink.
Kafka and Inverted Database
- Martin Kleppmann’s talk “Turning the Database Inside Out with Apache Samza”.
- Jay Kreps’ articles: It’s Okay to Store Data in Kafka, The Log: What Every Software Engineer Should Know About Real-time Data’s Unifying Abstraction
- Neha Narkhede on “event sourcing”
- Spotify’s Event Delivery
- Boerge Svingen, Publishing with Kafka at NY Times (SWE daily, Confluent blog)
- Explainer articles on Dremio website. Topics include data engineering, Apache Arrow, Data Warehouses, Data Pipelines, ETL Tools.
- Quizlet on Airflow: link.
- Rebuilding Yelp’s Data Pipeline with Justin Cunningham (Data Engineering Podcast)
- Danny Yuan on Real-Time, Time Series Forecasting @Uber: link
- Josh Evans, A Netflix Guide to Microservices (InfoQ talk)
- Beyond Buzzwords: A Brief History of Microservice Patterns (Kyle Brown, IBM)
- Martin Fowler on microservices
- William Morgan on Scaling Twitter (SWE daily)
- Josh Wills’ talk at MLconf.
- Scaling Uber with Matt Ranney (SWE daily)
- Cassandra with Tim Berglund (SWE daily)