Show HN: 500k+ events/sec transformations for ClickHouse ingestion

Name: Show HN: 500k+ events/sec transformations for ClickHouse ingestion
Rating: 4.5 (2 reviews)

Solves scaling issues for high-throughput data pipelines into ClickHouse (500k+ events/sec) by scaling within a single pipeline using replicas, addressing challenges with stateful transformations, high-cardinality keys, and long time windows.

Traction Score

Discussions

Apr 9, 2026

Launch Date

View Origin Link

Product Positioning & Context

AI Executive Synthesis

GlassFlow directly addresses a critical scalability and operational complexity pain point for enterprises utilizing ClickHouse for high-throughput data ingestion, particularly in observability and real-time analytics. The current industry practice of scaling by adding fragmented pipelines leads to duplicated logic, inconsistent state, and debugging difficulties. GlassFlow's approach of scaling within a single pipeline via replicas, supporting stateful transformations, and leveraging a file-based KV store, offers a superior architectural model. This product targets a mature market segment experiencing significant data volume growth, providing a robust solution for maintaining performance and operational simplicity at scale. The linear scaling and optimized ClickHouse sink are strong technical differentiators.

Hi HN! We are Ashish and Armend, founders of GlassFlow.Over the last year, we worked with teams running high-throughput pipelines into self-hosted ClickHouse. Mostly for observability and real-time analytics.A question that came repeatedly was:
What happens when throughput grows?Usually, things work fine at 10k events/sec, but we started seeing backpressure and errors at >100k.When the throughput per pipeline stops scaling, then adding more CPU/memory doesn’t help because often parts of the pipeline are not parallelized or are bottlenecked by state handling.At this point, engineers usually scale by adding more pipeline instances.That works but comes with some trade-offs:
- You have to split the workload (e.g., multiple pipelines reading from the same source)
- Transformation logic gets duplicated across pipelines
- Stateful logic becomes harder to manage and keep consistent
- Debugging and changes get more difficult because the data flow is fragmentedAnother challenge arises when working with high-cardinality keys like user IDs, session IDs, or request IDs, and when you need to handle longer time windows (24h or more). The state grows quickly and many systems rely on in-memory state, which makes it expensive and harder to recover from failures.We wanted to solve this problem and rebuild our approach at GlassFlow.Instead of scaling by adding more pipelines, we scale within a single pipeline by using replicas. Each replica consumes, processes, and writes independently, and the workload is distributed across them.In the benchmarks we’re sharing, this scales to 500k+ events/sec while still running stateful transformations and writing into ClickHouse.A few things we think are interesting:
- Scaling is close to linear as you add replicas
- Works with stateful transformations (not just stateless ingestion)
- State is backed by a file-based KV store instead of relying purely on memory
- The ClickHouse sink is optimized for batching to avoid small inserts
- The product is built with GoFull write-up + benchmarks:
https://www.glassflow.dev/blog/glassflow-now-scales-to-500k-...Repo:
https://github.com/glassflow/clickhouse-etlHappy to answer questions about the design or trade-offs.

Related Ecosystem & Alternatives

Discover adjacent products, open-source repositories, and developer tools sharing similar technical architecture.

Deep-Dive FAQs

What is 500k+ events/sec transformations for ClickHouse ingestion?

500k+ events/sec transformations for ClickHouse ingestion is analyzed by our AI as: Solves scaling issues for high-throughput data pipelines into ClickHouse (500k+ events/sec) by scaling within a single pipeline using replicas, addressing challenges with stateful transformations, high-cardinality keys, and long time windows.. It focuses on GlassFlow directly addresses a critical scalability and operational complexity pain point for enterprises utilizing ClickHouse for high-throughput ...

Where did 500k+ events/sec transformations for ClickHouse ingestion originate?

Data for 500k+ events/sec transformations for ClickHouse ingestion was aggregated directly from the Hacker News community ecosystem, representing raw developer and early-adopter sentiment.

When was 500k+ events/sec transformations for ClickHouse ingestion publicly launched?

The initial public indexing or launch date for 500k+ events/sec transformations for ClickHouse ingestion within our tracked developer communities was recorded on April 9, 2026.

How popular is 500k+ events/sec transformations for ClickHouse ingestion?

500k+ events/sec transformations for ClickHouse ingestion has achieved measurable traction, logging over 11 traction score and facilitating 2 recorded discussions or engagements.

Which technical categories define 500k+ events/sec transformations for ClickHouse ingestion?

Based on metadata extraction, 500k+ events/sec transformations for ClickHouse ingestion is categorized under topics such as: high-throughput pipelines, ClickHouse ingestion, observability, real-time analytics.

What are some commercial alternatives to 500k+ events/sec transformations for ClickHouse ingestion?

Our semantic intelligence engine identifies potential commercial alternatives in the SaaS space, such as Teable 3.0, which offers overlapping value propositions.

How does the creator describe 500k+ events/sec transformations for ClickHouse ingestion?

The original author or development team describes the product as follows: "Hi HN! We are Ashish and Armend, founders of GlassFlow.Over the last year, we worked with teams running high-throughput pipelines into self-hosted ClickHouse. Mostly for observability and real-time..."

Community Voice & Feedback

112mercer • Apr 10, 2026

Can a user go directly from Kafka to Clickhouse on GlassFlow with out touching Flink?

vladamon • Apr 8, 2026

[dead]

MarkSfik • Apr 8, 2026

As someone who has wrestled with Flink's JVM heap management and the complexity of TaskManagers/JobManagers, the 'scaling within a single pipeline' idea is compelling.
Why should I choose this over Flink for a ClickHouse sink? Is the main draw the operational simplicity (no cluster management), or are there specific ClickHouse-native optimizations in your implementation that Flink’s JDBC/official connectors are missing?

Discovery Source

Hacker News

Aggregated via automated community intelligence tracking.

Tech Stack Dependencies

No direct open-source NPM package mentions detected in the product documentation.

Media Tractions & Mentions

No mainstream media stories specifically mentioning this product name have been intercepted yet.

Deep Research & Science

No direct peer-reviewed scientific literature matched with this product's architecture.