Skip to content

Cephalon.Data.ClickHouse

Maturity: M1 · Ownership: provider-managed · Family: data-and-cdc · See audit, matrix.

Cephalon.Data.ClickHouse is the ClickHouse analytics store companion pack for Cephalon, following the same companion-pack pattern established by Cephalon.Data.MongoDB, Cephalon.Data.Redis, Cephalon.Data.Neo4j, and Cephalon.Data.Cassandra without any changes to Cephalon.Engine or Cephalon.Abstractions.

  • registers a scoped IOutbox backed by a ClickHouse ReplacingMergeTree table when RegisterOutbox is enabled; a new ClickHouseConnection is created per operation (no long-lived connection object)
  • registers a scoped IInbox backed by a ClickHouse ReplacingMergeTree table when RegisterInbox is enabled; uses SELECT ... FINAL to force deduplication at read time
  • ensures outbox staging is eventually idempotent through ClickHouse ReplacingMergeTree(created_at_utc) with ORDER BY (message_id) — duplicate rows with the same message_id are merged asynchronously by ClickHouse background merge processes
  • ensures inbox reads are idempotent via the FINAL modifier, which forces merge-time deduplication at query execution
  • exposes operator-facing outbox and inbox descriptors when the respective path is enabled
  • declares the ClickHouse outbox as DispatchPolicy.PolicyId = unsupported with ExecutionMode = disabled so /engine/outboxes, event-dispatches, and /engine/snapshot report staging-only truth instead of a generic “not configured” answer
  • projects the outbox descriptor through the event-driven-integration technology surface as outbox-producers with provider: "clickhouse" and mode: "replacing-merge-tree" when that technology is active
  • projects the inbox descriptor through the same technology surface as inbox-stores when the technology is active
  • publishes capability metadata data.clickhouse, data.analytics-store, and optionally data.outbox.clickhouse and data.inbox.clickhouse introspectable at runtime through the manifest
  • Configuration/ClickHouseDataOptions.cs
  • Modules/ClickHouseDataModule.cs
  • Registration/ClickHouseDataEngineBuilderExtensions.cs
  • Services/ClickHouseOutbox.cs
  • Services/ClickHouseOutboxRuntimeSurfaceContributor.cs
  • Services/ClickHouseInbox.cs
  • Services/ClickHouseInboxRuntimeSurfaceContributor.cs

This pack sits on top of Cephalon.Data, not in place of it. Cephalon.Data still owns the runtime-neutral IReadStore / IWriteStore dispatching surface. Cephalon.Data.ClickHouse adds the ClickHouse-backed outbox and inbox persistence paths that let event-driven workloads stage and track messages using an analytics-optimized columnar store.

The slice is intentionally narrow: it proves the companion-pack pattern extends cleanly to ClickHouse, ships an eventually-idempotent outbox and inbox suited to high-throughput analytics workloads, and exposes the same runtime introspection surfaces as the other provider packs. IReadStore and IWriteStore are not backed directly by ClickHouse in this slice, and the outbox remains explicitly staging-only until Cephalon has a truthful ClickHouse-native mutable dispatch-state design.

engine.AddClickHouseData(
host: "localhost",
database: "myapp");

To enable the outbox and inbox paths:

engine.AddClickHouseData(
host: "localhost",
database: "myapp",
configure: options =>
{
options.RegisterOutbox = true;
options.RegisterInbox = true;
options.TablePrefix = "app_"; // optional — prefix all Cephalon tables
});
OptionTypeDefaultDescription
Hoststring"localhost"ClickHouse host address
Portint8123ClickHouse HTTP port
Databasestring"default"Target ClickHouse database
Usernamestring"default"ClickHouse username
Passwordstring""ClickHouse password
TablePrefixstring"cephalon_"Optional prefix for all Cephalon-managed table names
RegisterOutboxboolfalseRegister IOutbox backed by the {TablePrefix}outbox_messages table
RegisterInboxboolfalseRegister IInbox backed by the {TablePrefix}inbox_receipts table

Outbox table schema ({TablePrefix}outbox_messages)

Section titled “Outbox table schema ({TablePrefix}outbox_messages)”

The table name is {TablePrefix}outbox_messages (default: cephalon_outbox_messages).

ColumnClickHouse typeNotes
message_idStringIdempotency key (GUID); ORDER BY key for ReplacingMergeTree deduplication
channel_idStringChannel the message targets
message_typeStringFully qualified CLR message type name
payloadStringSystem.Text.Json-serialized message body
content_typeStringMIME type of the payload
correlation_idStringOptional causality tracking identifier
tenant_idStringOptional multi-tenancy discriminator
occurred_at_utcDateTime64(3, 'UTC')UTC time at which the domain event or message occurred
created_at_utcDateTime64(3, 'UTC')UTC wall-clock time when the row was staged; used as the ReplacingMergeTree version column
dispatch_attempt_countInt32Incremented on each dispatch attempt; starts at 0
headers_jsonStringSystem.Text.Json-serialized headers dictionary
metadata_jsonStringSystem.Text.Json-serialized metadata dictionary

Engine: ReplacingMergeTree(created_at_utc) ORDER BY (message_id)

Inbox table schema ({TablePrefix}inbox_receipts)

Section titled “Inbox table schema ({TablePrefix}inbox_receipts)”

The table name is {TablePrefix}inbox_receipts (default: cephalon_inbox_receipts).

ColumnClickHouse typeNotes
message_idStringProcessed message id; ORDER BY key for ReplacingMergeTree deduplication
channel_idStringChannel the message arrived on
message_typeStringFully qualified CLR message type name
correlation_idStringOptional causality tracking identifier
tenant_idStringOptional multi-tenancy discriminator
received_at_utcDateTime64(3, 'UTC')UTC time at which the message was received
processed_at_utcDateTime64(3, 'UTC')UTC wall-clock time when the row was written; used as the ReplacingMergeTree version column

Engine: ReplacingMergeTree(processed_at_utc) ORDER BY (message_id)

EnqueueAsync inserts a row into the ReplacingMergeTree table. Duplicate rows with the same message_id are deduplicated asynchronously during ClickHouse background merge processes:

  • Immediate visibility: Both the original and duplicate rows may be returned by SELECT queries until a merge cycle completes.
  • After merge: Only the row with the latest created_at_utc value is retained.
  • Read-time deduplication: Use SELECT ... FINAL to force merge-time deduplication at query execution, at the cost of slower reads.

This idempotency model is suitable for high-throughput analytics-oriented outbox workloads. It is not suitable for strict exactly-once guarantees in transactional workloads — use Cephalon.Data.Cassandra (LWT INSERT IF NOT EXISTS) or Cephalon.Data.Redis for that.

HasProcessedAsync issues SELECT count() FINAL WHERE message_id = ?. The FINAL modifier forces ClickHouse to deduplicate rows from the ReplacingMergeTree at read time, providing a consistent view even before background merges complete.

ClickHouseConnection is an ADO.NET DbConnection. No socket is opened when the connection is constructed. The TCP connection to the ClickHouse HTTP endpoint is deferred until connection.OpenAsync() is called. A new connection is created per operation — there is no long-lived connection object held by the outbox, inbox, or event store.

This means IOutbox, IInbox, and IEventStore can all be resolved from the DI container in tests or at startup without requiring a live ClickHouse server.

When ClickHouseDataModule is active, the following capability keys appear in the runtime manifest:

Capability keyWhen registered
data.clickhouseAlways
data.analytics-storeAlways
data.outbox.clickhouseRegisterOutbox = true
data.inbox.clickhouseRegisterInbox = true

When the event-driven-integration technology is active, the following entries appear under /engine/snapshot:

SurfaceEntry idKey metadata
outbox-producersclickhouse-outboxprovider = clickhouse, dispatchPolicyId = unsupported, dispatchExecutionMode = disabled
inbox-storesclickhouse-inboxprovider = clickhouse

When RegisterOutbox = true, the outbox descriptor stays visible through /engine/outboxes, event-dispatches, and snapshot.Outboxes, but it now answers with an explicit unsupported policy instead of a generic disabled/default placeholder:

  • DispatchPolicy.PolicyId = unsupported
  • DispatchPolicy.ExecutionMode = disabled
  • dispatchStore = unsupported
  • dispatchRuntime = unsupported
  • dispatchPolicy.reason = append-only-analytics-store

That answer is deliberate. The current ClickHouse ReplacingMergeTree baseline is good for durable staging/history and analytics-style replay, but it is not yet the truthful owner for mutable pending-dispatch state in the same way as the current Entity Framework, MongoDB, Redis, Elasticsearch, OpenSearch, Neo4j, Qdrant, NATS, or Cassandra follow-through slices.

This pack intentionally does not claim:

  • batch dispatch or broker retry scheduling
  • provider-native IEventDispatchStore ownership for mutable pending-dispatch state
  • TTL-based expiry of outbox or inbox rows
  • Materialized Views for event stream projections
  • Dictionary-based deduplication (alternative to FINAL reads)
  • IReadStore / IWriteStore dispatch backed by ClickHouse — query handlers should use ClickHouseConnection directly
  • transaction-scoped outbox staging (ClickHouse does not support multi-row ACID transactions)
  • strict exactly-once delivery guarantees (use Cephalon.Data.Cassandra LWT for that)

These remain explicit later slices. Any future managed-dispatch follow-through for ClickHouse should start with a dedicated mutable pending-dispatch design rather than pretending the current staging table is already a truthful dispatch-state owner.