ClickHouse: consuming from Kafka

Introduction

This blog is about ClickHouse, a blazing fast column-oriented DBMS, and how to configure it to consume data from Kafka in a simple way.

A Kafka table

As per the documentation, you can easily create a specific Kafka table that will, behind the scenes, generate 1 (or more) consumer(s) for the Kafka topic(s) you chose.

Let’s say we are consuming from a metrics topic, messages with the following structure:

{
  "name": "<string>",
  "date": <epoch_unix>,
  "message": "<string>"
}

The following table will allow to consume messages from this topic:

CREATE TABLE IF NOT EXISTS metrics
(
    `name` String,
    `date` DateTime,
    `message` String
) ENGINE = Kafka()
  SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list = 'metrics',
    kafka_group_name = 'metrics-group',
    kafka_format = 'JSONEachRow'

The use of JSONEachRow assumes that all your messages have this structure (with the three keys name, date and message). This might break if some of them dont have this structure but it is OK if you can ensure that this will not happen.

Be careful, this table only temporary stores messages. As soon as the message are requested at least one time, they will be lost if not stored (and linked offset will be commited to Kafka). To actually store the message in a table, you will need to stream them to a MATERIALIZED VIEW that will then put it to the final table.

The final table

Lets keep it simple for now and lets just use a classic MergeTree() engine table:

CREATE TABLE final_metrics
(
  `name` String,
  `date` DateTime,
  `message` String
) ENGINE = MergeTree()
  PARTITION BY toYYYYMM(date)
  ORDER BY name

A `MATERIALIZED VIEW`

The following table will receive the rows from the metrics table and put it in the final_metrics table:

CREATE MATERIALIZED VIEW metrics_consumer TO final_metrics
AS SELECT
  name,
  date,
  message
FROM metrics

If wanted, we would be able to use this MATERIALIZED VIEW to filter the kind of data we want. For instance, filtering on the name:

CREATE MATERIALIZED VIEW metrics_consumer TO final_metrics
AS SELECT
  name,
  date,
  message
FROM metrics
WHERE name = 'magic-metric'

ClickHouse: consuming from Kafka

Introduction

A Kafka table

The final table

A MATERIALIZED VIEW

A `MATERIALIZED VIEW`