Kafkaで複数のコンシューマーを使用するにはどうすればよいですか？

Question

私はKafkaを勉強している新しい学生であり、複数の消費者がこれまで記事やドキュメントなどはあまり役に立たなかったということを理解する上でいくつかの基本的な問題に遭遇しました。

私がやろうとしたことの1つは、独自の高レベルKafka=プロデューサーとコンシューマーを作成し、それらを同時に実行し、トピックに100の簡単なメッセージを発行し、コンシューマーにそれらを取得させることです。これは成功しますが、メッセージが発行されたばかりの同じトピックから消費する2番目のコンシューマーを紹介しようとすると、メッセージを受信しません。

各トピックについて、別々のコンシューマーグループのコンシューマーを使用でき、これらのコンシューマーグループのそれぞれが、あるトピックに対して生成されたメッセージの完全なコピーを取得できることを理解していました。これは正しいです？そうでない場合、複数の消費者を設定する適切な方法は何ですか？これは私がこれまでに書いたコンシューマークラスです。

public class AlternateConsumer extends Thread { private final KafkaConsumer<Integer, String> consumer; private final String topic; private final Boolean isAsync = false; public AlternateConsumer(String topic, String consumerGroup) { Properties properties = new Properties(); properties.put("bootstrap.servers", "localhost:9092"); properties.put("group.id", consumerGroup); properties.put("partition.assignment.strategy", "roundrobin"); properties.put("enable.auto.commit", "true"); properties.put("auto.commit.interval.ms", "1000"); properties.put("session.timeout.ms", "30000"); properties.put("key.deserializer", "org.Apache.kafka.common.serialization.IntegerDeserializer"); properties.put("value.deserializer", "org.Apache.kafka.common.serialization.StringDeserializer"); consumer = new KafkaConsumer<Integer, String>(properties); consumer.subscribe(topic); this.topic = topic; } public void run() { while (true) { ConsumerRecords<Integer, String> records = consumer.poll(0); for (ConsumerRecord<Integer, String> record : records) { System.out.println("We received message: " + record.value() + " from topic: " + record.topic()); } } } }

さらに、元々、単一のパーティションのみでトピック「テスト」の上記消費量をテストしていたことに気付きました。「testGroup」などの既存のコンシューマグループに別のコンシューマを追加すると、これによりKafka=リバランスがトリガーされ、消費の遅延が数秒で大幅に遅くなりました。パーティションが1つしかないため、これはリバランスの問題でしたが、たとえば6つのパーティションで新しいトピック「マルチパーティション」を作成すると、同じコンシューマグループにコンシューマを追加するとレイテンシの問題が発生するという同様の問題が発生しました。そして、人々は私がマルチスレッド消費者を使うべきだと私に言っています-誰もそれを明らかにすることができますか？

Chris Gerken · Accepted Answer

あなたの問題はauto.offset.resetプロパティにあると思います。新しいコンシューマがパーティションから読み取り、以前にコミットされたオフセットがない場合、auto.offset.resetプロパティを使用して開始オフセットを決定します。「最大」（デフォルト）に設定すると、最新の（最後の）メッセージから読み始めます。「最小」に設定すると、最初に使用可能なメッセージが表示されます。

追加：

properties.put("auto.offset.reset", "smallest");

そしてさらに試みる。

*編集*

「最小」と「最大」は非推奨になりました。今すぐ「最も早い」または「最新の」を使用する必要があります。ご質問は、 docs をご確認ください

追加：

properties.put("auto.offset.reset", "smallest");

そしてさらに試みる。

*編集*

「最小」と「最大」は非推奨になりました。今すぐ「最も早い」または「最新の」を使用する必要があります。ご質問は、 docs をご確認ください

user1119541 · Answer

複数のコンシューマーが同じメッセージ（ブロードキャストなど）を消費するようにする場合は、異なるコンシューマーグループでそれらを生成し、コンシューマー構成でauto.offset.resetを最小に設定することもできます。複数のコンシューマーが並行してコンシュームを完了したい場合（作業をそれらの間で分割する場合）、パーティションの数を作成する必要があります。 1つのパーティションは、最大で1つのコンシューマプロセスのみが使用できます。ただし、1つのコンシューマーが複数のパーティションを消費する可能性があります。

Alper Akture · Answer

ドキュメントでは here とあります。「トピックにパーティションがあるよりも多くのスレッドを提供すると、一部のスレッドはメッセージを表示しません」。トピックにパーティションを追加できますか？トピック内のパーティションの数に等しいコンシューマグループスレッドカウントがあり、各スレッドがメッセージを取得しています。

これが私のトピック設定です：

buffalo-macbook10:kafka_2.10-0.8.2.1 aakture$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic recent-wins Topic:recent-wins PartitionCount:3 ReplicationFactor:1 Configs: Topic: recent-wins Partition: 0 Leader: 0 Replicas: 0 Isr: 0 Topic: recent-wins Partition: 1 Leader: 0 Replicas: 0 Isr: 0 Topic: recent-wins Partition: 2 Leader: 0 Replicas: 0 Isr: 0

そして私の消費者：

package com.cie.dispatcher.services; import com.cie.dispatcher.model.WinNotification; import com.fasterxml.jackson.databind.ObjectMapper; import com.google.inject.Inject; import io.dropwizard.lifecycle.Managed; import kafka.consumer.ConsumerConfig; import kafka.consumer.ConsumerIterator; import kafka.consumer.KafkaStream; import kafka.javaapi.consumer.ConsumerConnector; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import Java.util.HashMap; import Java.util.List; import Java.util.Map; import Java.util.Properties; import Java.util.concurrent.ExecutorService; import Java.util.concurrent.Executors; import Java.util.concurrent.TimeUnit; /** * This will create three threads, assign them to a "group" and listen for notifications on a topic. * Current setup is to have three partitions in Kafka, so we need a thread per partition (as recommended by * the kafka folks). This implements the dropwizard Managed interface, so it can be started and stopped by the * lifecycle manager in dropwizard. * <p/> * Created by aakture on 6/15/15. */ public class KafkaTopicListener implements Managed { private static final Logger LOG = LoggerFactory.getLogger(KafkaTopicListener.class); private final ConsumerConnector consumer; private final String topic; private ExecutorService executor; private int threadCount; private WinNotificationWorkflow winNotificationWorkflow; private ObjectMapper objectMapper; @Inject public KafkaTopicListener(String a_zookeeper, String a_groupId, String a_topic, int threadCount, WinNotificationWorkflow winNotificationWorkflow, ObjectMapper objectMapper) { consumer = kafka.consumer.Consumer.createJavaConsumerConnector( createConsumerConfig(a_zookeeper, a_groupId)); this.topic = a_topic; this.threadCount = threadCount; this.winNotificationWorkflow = winNotificationWorkflow; this.objectMapper = objectMapper; } /** * Creates the config for a connection * * @param zookeeper the Host:port for zookeeper, "localhost:2181" for example. * @param groupId the group id to use for the consumer group. Can be anything, it's used by kafka to organize the consumer threads. * @return the config props */ private static ConsumerConfig createConsumerConfig(String zookeeper, String groupId) { Properties props = new Properties(); props.put("zookeeper.connect", zookeeper); props.put("group.id", groupId); props.put("zookeeper.session.timeout.ms", "400"); props.put("zookeeper.sync.time.ms", "200"); props.put("auto.commit.interval.ms", "1000"); return new ConsumerConfig(props); } public void stop() { if (consumer != null) consumer.shutdown(); if (executor != null) executor.shutdown(); try { if (!executor.awaitTermination(5000, TimeUnit.MILLISECONDS)) { LOG.info("Timed out waiting for consumer threads to shut down, exiting uncleanly"); } } catch (InterruptedException e) { LOG.info("Interrupted during shutdown, exiting uncleanly"); } LOG.info("{} shutdown successfully", this.getClass().getName()); } /** * Starts the listener */ public void start() { Map<String, Integer> topicCountMap = new HashMap<>(); topicCountMap.put(topic, new Integer(threadCount)); Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap); List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic); executor = Executors.newFixedThreadPool(threadCount); int threadNumber = 0; for (final KafkaStream stream : streams) { executor.submit(new ListenerThread(stream, threadNumber)); threadNumber++; } } private class ListenerThread implements Runnable { private KafkaStream m_stream; private int m_threadNumber; public ListenerThread(KafkaStream a_stream, int a_threadNumber) { m_threadNumber = a_threadNumber; m_stream = a_stream; } public void run() { try { String message = null; LOG.info("started listener thread: {}", m_threadNumber); ConsumerIterator<byte[], byte[]> it = m_stream.iterator(); while (it.hasNext()) { try { message = new String(it.next().message()); LOG.info("receive message by " + m_threadNumber + " : " + message); WinNotification winNotification = objectMapper.readValue(message, WinNotification.class); winNotificationWorkflow.process(winNotification); } catch (Exception ex) { LOG.error("error processing queue for message: " + message, ex); } } LOG.info("Shutting down listener thread: " + m_threadNumber); } catch (Exception ex) { LOG.error("error:", ex); } } } }