Day 3 - Diving into Apache Kafka Streams Internals

Exploring Kafka Streams and investigating a StreamsConfig issue in Open Source Kafka.
Author

Shashank Hosahalli Shivamurthy

Published

July 9, 2025

Day 3 - Kafka Streams and First OSS Exploration

Today, I decided to shift gears from theory to hands-on learning.

Since I already have experience using Kafka in enterprise applications, I wanted to go deeper and explore Kafka’s internal codebase — specifically the open-source Apache Kafka GitHub repository.

I started looking for beginner-friendly issues and found one under the Kafka Streams module.


What is Kafka Streams?

Kafka Streams is a client library for building real-time, event-driven applications and microservices using Apache Kafka.

  • It allows developers to process and analyze data stored in Kafka.
  • You can perform operations like filtering, mapping, joining, grouping, and windowing — all on Kafka topics.
  • It is lightweight, has no separate cluster, and runs within your application process.
  • You get exactly-once semantics, fault tolerance, and scalability, right out of the box.

Kafka Streams turns raw event data into actionable, real-time insights within your application.


The Issue I Worked On

  • The issue I found was in the StreamsConfig class.
  • Some Kafka Streams configuration properties are not meant to be overridden by users, but:
    • There’s no feedback to users that their custom values are being ignored.
  • The task was to:
    • Add a warning log message whenever a user sets one of these restricted properties.
    • The log should inform that their value is ignored and Kafka Streams has overridden it internally.

What I Did Today

  • Forked the Apache Kafka GitHub repository.
  • Set up my local dev environment.
  • Navigated and explored the streams module.
  • Focused on understanding how StreamsConfig validates and handles config values.
  • Started working on the logging logic to notify users when restricted configs are overridden.

I ended up spending the entire day on this one issue, reading the code, understanding the architecture, and slowly making sense of the system.

It was challenging but rewarding and exactly the kind of exposure I was hoping for when I set this goal.


More tomorrow!!!