Articles---mainly about Streaming Systems

Posted on

Articles and talks that I referred to while working on the taxi data project. Published here as a note to myself.

Model Serving

  • FLIP-23. The document also discusses implementing model training as well as model serving. Two linked documents—“Flink ML Roadmap” and “Flink-MS”—are also worth reading.
  • Boris Lublinsky’s book “Serving Machine Learning Models”, and talk.

Distributed Systems

Streaming Systems

The Dataflow Model, Apache Beam, and Cloud Dataflow

  • Tyler Akidau’s article on the Dataflow model.
  • Frances Perry’s talk covers the same topics as the aforementioned article by Tyler Akidau.
  • Big Data Processing at Spotify
  • Hands-on exercises to try out the Dataflow model created by dataArtisans.
  • Jay Kreps’ article “Questioning the Lambda Architecture”.
  • Martin Kleppmann compares Samza to other stream processing systems.
  • Flink Blog: articles that discuss how exactly-once processing, checkpointing, joins, etc. are implemented in Flink.

Kafka and Inverted Database

Data Pipelines

  • Explainer articles on Dremio website. Topics include data engineering, Apache Arrow, Data Warehouses, Data Pipelines, ETL Tools.
  • Quizlet on Airflow: link.
  • Rebuilding Yelp’s Data Pipeline with Justin Cunningham (Data Engineering Podcast)
  • Danny Yuan on Real-Time, Time Series Forecasting @Uber: link


  • Josh Evans, A Netflix Guide to Microservices (InfoQ talk)
  • Beyond Buzzwords: A Brief History of Microservice Patterns (Kyle Brown, IBM)
  • Martin Fowler on microservices


  • William Morgan on Scaling Twitter (SWE daily)
  • Josh Wills’ talk at MLconf.
  • Scaling Uber with Matt Ranney (SWE daily)
  • Cassandra with Tim Berglund (SWE daily)