Here are my talks, blog posts and other related links.
Staff Platform Engineer with strong experience managing complex production operations, with expertise in distributed systems and databases.
Elad likes to solve complex problems around delivering real time infrastructure in perpetual growth, and maintains some of the core pieces of large-scale production architecture serving 200B events daily.
Some of the technologies Elad works with include Kafka, Aerospike, Redis, Memcached, and he likes to program in Python, Ruby and Go.
· Pushing Your Streaming Platform to the Limit - Conf42 Chaos Engineering [2024] [Full]
· Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction - Kafka Summit London 2022 [2022] [Full]
· Kafka Lag Monitoring For Human Beings - Kafka Summit 2020 [2020] [Full]
· Migrating from BitBucket to GitLab - GitlabCommit San Francisco 2020 [2020] [Full]
· What can you learn from the biggest automation company in the world? [2019] [Ignite]
· Let’s Make Your CFO Happy; A Practical Guide for Cost Reduction - HayaData 2022 [2022] [Full]
· Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction - AerospikeTLV: Streaming and Querying Data [2022] [Full]
· Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction - Don’t Lose Control: What to Do When You Scale Beyond the Limits
[2022] [Full]
· Kafka: Revolution to Evolution - DataOps Meetup: Conquering Data with Kafka & Airflow [2020] [Full]
· Handling Increasing Load and Reducing Costs During COVID-19 Crisis [2020] [10m]
· AppsFlyer's Case Study of GitLab [2018] [Full]
· Know Your Limits: Cluster Benchmarks
· A Practical Guide for Kafka Cost Reduction
· Apache Kafka Lag Monitoring at AppsFlyer
· My journey from Python to Go
· GitLab: The Magic of System Hooks
· Personal Blog - Heb / Eng
· My journey from Python to Go [Chinese - InfoQ], [Russian - Habr]
· A Practical Guide for Kafka Cost Reduction [Chinese - InfoQ]
· Confluent DevX Newsletter - My open source tool featured on Confluent's Newsletter [Eng]
· Confluent Community Catalyst - Confluent Website [Eng]
· Apache Kafka Lag Monitoring at AppsFlyer - Confluent Blog [Eng]
· Best Kafka Summit Videos - Apache Kafka Official Page [Eng]
· What is the company doing “Flying” the Apps? - Aerospike Blog [Eng]
· How AppsFlyer use Aerospike to support the growth during COVID-19 - People & Computers [Heb]
· Why AppsFlyer moved from Bitbucket to GitLab - Gitlab Blog
· GitLab Heroes Page - Gitlab Website
Click to expand
Pushing Your Streaming Platform to the Limit
In this session, we’ll take a hands-on approach to Chaos Engineering for streaming platforms like Kafka, Pulsar, NATS, or RabbitMQ. Dive into stress testing, from crafting benchmarks to real-world execution. Discover how to fine-tune performance and scalability, preparing your system for any challenges ahead.
Type: Full-length Presentation
Tags: Data, Streaming, Kafka, NATS, Pulsar, RabbitMQ, OMB, Benchmark, Chaos, Stress Tests
Let’s make your CFO happy; A practical guide for cost reduction
According to Gartner Forecasts, the worldwide end-user spending on public cloud services is forecast to grow by 23% in 2021, to a total of $332B. As organizations evolve and grow, data rates grow too, as do consequent cloud costs. Take a look at your AWS bill, and you will probably find Hadoop, Spark, and Kafka at the top. So what can we do?
In this talk, we are going to address exactly this problem. We will understand what we are paying for, how to develop an economic mindset, where we can cut costs, and what we can proactively do to reduce our cloud infrastructure cost.
Type: Full-length Presentation
Tags: Cloud, Cloud Cost, FinOps, Kafka, Spark, Distributed Systems, Cost reduction, Cost saving, AWS, GCP, Data
Kafka Lag Monitoring for Human Beings
One of the key metrics to monitor when working with Apache Kafka, as a data pipeline or a streaming platform, is Consumer Groups Lag.
Lag is the delta between the last produced message and the last committed message of a partition. In other words, lag indicates how far behind your application is in processing up-to-date information. For a long time, we used our own service to keep track of these metrics, collect them and visualize them. But this didn’t scale well.
You had to perform many manual operations, redeploy it and to do other tedious manual tasks, but most importantly, the biggest gap for us, was that its out was represented in absolute numbers (e.g - your lag is 30K), which basically tells you nothing as a human being.
We understood that we had to find a more suitable solution that will give us better visibility and will allow us to measure the lag in a time-based format that we all understand. In this talk, I’m going to go over the core concepts of Kafka offsets and lags, and explain why lag even matters and is an important KPI to measure. I’ll also talk about the kind of research we did to find the right tool, what the options in the market were at the time, and eventually why we chose Linkedin’s Burrow as the right tool for us. And finally, I’ll take a closer look at Burrow, its building blocks, how we build and deploy it, how we monitor better with it, and eventually the most important improvement - how we transformed its output from numbers to time-based metrics.
Type: Full-length Presentation
Tags: Kafka, Monitoring, Lag, Data Pipeline, Streaming, Burrow
A Journey from Python to Go
I love Python. It has been my go-to language for the past five years. But the growth in the popularity and maturity of Go, alongside the strong user base, made me think about how I can add it into my tool set.
In this talk, I'm going to tell you about my journey from Python to Go, and provide you with some tips and expose you to some of the resources that helped me succeed on this journey and live to tell the tale. I will dive into some of the main differences, and how to minimize the learning curve, as well as some of the excellent libraries and tools that enabled me to ramp up my Go coding skills pretty quickly & painlessly.
Type: Full-length Presentation
Tags: Go, Golang, Python, Coding, Resources, Tips
What can you learn from the biggest automation company in the world?
We will go over some high scale patterns in one of the most surprising and loved company in the industry.
I'm lovin it.
Type: Ignite
Tags: Tech, Scale, Software Patterns, System Design, Distributed Systems
Migration from BitBucket to GitLab
AppsFlyer migrated its entire git operation, with production clients from BitBucket to Gitlab. This talk will dive into what was involved with the migration process - from building the architecture through selecting the tooling and eventually how we built our very own self-serve API abstraction over the GitLab API. Some of the points the talk will review:
- The migration process - from Mercurial to Git, how to move all projects, how to get developer buy-in and the lessons learned during the process
- Architecture - How we built it, the challenges we faced, how we built our DR solution, alongside the distributed backup
- Building monitoring for the environment
- Self-service, tooling & and some pro tips and tricks for working with Gitlab
While this will be a talk about our Gitlab implementation, it will also provide key takeways for making such a migration in a large-scale engineering organization.
Type: Full-length Presentation
Tags: GitLab, Git, BitBucket, Migration, Mercurial, hg, API
- Yelp/kafka-utils - Prometheus support, Jolokia authentication support
- OpenMessaging/benchmark - Alternative Dockerfile without Maven dependency, Allow users to change memory configurations
- Argoproj/argo-cd - Update Dex config, change misleading error message.
- LinkedIn/cruise-control - Document JMX in Sensors.md
- LinkedIn/Burrow - Adding readiness probe endpoint
- Influxdata/telegraf - Add the ability to control MaxProcessingTime on Sarama lib
- OhShitGit - Added Hebrew translation, and support for RTL languages
More can be found on my GitHub profile
- kafka-config-metrics - ⭐ 28
- kubeseal-convert - ⭐ 42
- schema-registry-statistics ⭐ 24
- KeyToField-smt ⭐ 15
- kafka-topic-creation-time