[기술 면접] Kafka

Apple 2023 맥북 프로 14, 스페이스그레이, M2 Pro 10코어, 16코어, 512GB, 16GB, Z17G0005G, 영문

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

서랍장

[기술 면접] Kafka 본문

private/나의 공부방

[기술 면접] Kafka

소소한 프로그래머 2022. 11. 1. 21:14

Summary

Kafka provides low-latency, high-throughput, fault-tolerant publish and subscribe pipelines and can process huge continuous streams of events.
Kafka can function both as a message queue and a publisher-subscriber system.
At a high level, Kafka works as a distributed commit log.
Kafka server is also called a broker. A Kafka cluster can have one or more brokers.
A Kafka topic is a logical aggregation of messages.
Kafka solves the scaling problem of a messaging system by splitting a topic into multiple partitions.
Every topic partition is replicated for fault tolerance and redundancy.
A partition has one leader replica and zero or more follower replicas.
Partition leader is responsible for all reads and writes. Each follower’s responsibility is to replicate the leader’s data to serve as a ‘backup’ partition.
Message ordering is preserved only on a per-partition basis (not across partitions of a topic).
Every partition replica needs to fit on a broker, and a partition cannot be divided over multiple brokers.
Every broker can have one or more leaders, covering different partitions and topics.
Kafka supports a single queue model with multiple readers by enabling consumer groups.
Kafka supports a publish-subscribe model by allowing consumers to subscribe to topics for which they want to receive messages.
ZooKeeper functions as a centralized configuration management service.

System design patterns

Here is a summary of system design patterns used in Kafka.

High-water mark

To deal with non-repeatable reads and ensure data consistency, brokers keep track of the high-water mark, which is the largest offset that all ISRs of a particular partition share. Consumers can see messages only until the high watermark.

Leader and follower

Each Kafka partition has a designated leader responsible for all reads and writes for that partition. Each follower’s responsibility is to replicate the leader’s data to serve as a ‘backup’ partition.

Split-brain

To handle split-brain (where we have multiple active controller brokers), Kafka uses ‘epoch number,’ which is simply a monotonically increasing number to indicate a server’s generation. This means if the old Controller had an epoch number of ‘1’, the new one would have ‘2’. This epoch is included in every request that is sent from the Controller to other brokers. This way, brokers can easily differentiate the real Controller by simply trusting the Controller with the highest number. This epoch number is stored in ZooKeeper.

Segmented log

Kafka uses log segmentation to implement storage for its partitions. As Kafka regularly needs to find messages on disk for purging, a single long file could be a performance bottleneck and error-prone. For easier management and better performance, the partition is split into segments.

'private > 나의 공부방' 카테고리의 다른 글

[기술 면접] HDFS (5)	2022.11.01
분산 시스템이란 (0)	2022.10.30

'private/나의 공부방' Related Articles

Comments

서랍장

[기술 면접] Kafka 본문

[기술 면접] Kafka

Summary

System design patterns

High-water mark

Leader and follower

Split-brain

Segmented log

'private > 나의 공부방' 카테고리의 다른 글

티스토리툴바