Show HN: Rocky – Rust SQL engine with branches, replay, column lineage
A control plane for data warehouses (Databricks, Snowflake, BigQuery, DuckDB) that owns the DAG, providing Git-grade workflow (branches, replay), column-level lineage, first-class governance, cost attribution, compile-time portability, and schema-grounded AI, explicitly not a warehouse, Fivetran replacement, or dbt Cloud.
View Origin Link
Product Positioning & Context
AI Executive Synthesis
A control plane for data warehouses (Databricks, Snowflake, BigQuery, DuckDB) that owns the DAG, providing Git-grade workflow (branches, replay), column-level lineage, first-class governance, cost attribution, compile-time portability, and schema-grounded AI, explicitly not a warehouse, Fivetran replacement, or dbt Cloud.
Rocky addresses critical enterprise data governance, lineage, and cost management challenges within modern data warehouse ecosystems. By positioning itself as a 'control plane' that 'owns the DAG,' it fills a gap where existing warehouses fall short. Features like 'Git-grade workflow' with branches and replay, 'column-level lineage from the compiler,' and 'governance as a first-class surface' are essential for regulated industries and large data operations. The 'cost attribution' and 'schema-grounded AI' further enhance operational efficiency and reliability. Rocky's modular approach, integrating with major warehouses while avoiding feature overlap with Fivetran or dbt Cloud, positions it as a crucial, specialized layer for robust, compliant, and cost-optimized data pipeline management.
Hi HN, I'm Hugo. I've been building Rocky over the past month, shipping fast in the open. The binary is on GitHub Releases, `dagster-rocky` on PyPI, and the VS Code extension on the Marketplace. I held off on a broader announcement until the trust-system surface was coherent enough to talk about as one thing. The governance waveplan — column classification, per-env masking, 8-field audit trail on every run, `rocky compliance` rollup, role-graph reconciliation, retention policies — landed end-to-end last week in engine-v1.16.0 and rounded out in v1.17.4 (tagged 2026-04-26). That's the milestone I'd been waiting for.The pitch: keep Databricks or Snowflake. Bring Rocky for the DAG. Rocky is a Rust-based control plane for warehouse pipelines. Storage and compute stay with your warehouse. Rocky owns the graph — dependencies, compile-time types, drift, incremental logic, cost, lineage, governance. The things your current stack can't give you because it doesn't own the DAG.A few things I think are interesting:- Branches + replay. `rocky branch create stg` gives you a logical copy of a pipeline's tables (schema-prefix today; native Delta SHALLOW CLONE and Snowflake zero-copy are next). `rocky replay ` reconstructs which SQL ran against which inputs. Git-grade workflow on a warehouse.- Column-level lineage from the compiler, not a post-hoc graph crawl. The type checker traces columns through joins, CTEs, and windows. VS Code surfaces it inline via LSP.- Governance as a first-class surface. Column classification tags plus per-env masking policies, applied to the warehouse via Unity Catalog (Databricks) or masking policies (Snowflake). 8-field audit trail on every run. `rocky compliance` rollup that CI can gate on. Role-graph reconciliation via SCIM + per-catalog GRANT. Retention policies with a warehouse-side drift probe.- Cost attribution. Every run produces per-model cost (bytes, duration). `[budget]` blocks in `rocky.toml`; breaches fire a `budget_breach` hook event.- Compile-time portability + blast radius. Dialect-divergence lint across Databricks / Snowflake / BigQuery / DuckDB (12 constructs). `SELECT *` downstream-impact lint.- Schema-grounded AI. Generated SQL goes through the compiler — AI suggestions type-check before they can land.What Rocky isn't:- Not a warehouse — it's the control plane on top.- Not a Fivetran replacement. `rocky load` handles files (CSV/Parquet/JSONL); for SaaS sources use Fivetran, Airbyte, or warehouse-native CDC.- Not dbt Cloud — no hosted UI, no managed scheduler. First-class Dagster integration if you need orchestration.Adapters: Databricks (GA), Snowflake (Beta), BigQuery (Beta), DuckDB (local dev / playground). Apache 2.0.I'd love feedback on the trust-system framing, the governance surface (particularly classification-to-masking resolution in `rocky compile` and the `rocky compliance` CI gate), the branches/replay design, the cost-attribution primitives, or anything else that catches your eye. Happy to go deep in the thread.
Rust SQL engine
branches
replay
column lineage
control plane
warehouse pipelines
DAG
dependencies
Related Ecosystem & Alternatives
Discover adjacent products, open-source repositories, and developer tools sharing similar technical architecture.
Deep-Dive FAQs
What is Rocky – Rust SQL engine with branches, replay, column lineage?
Rocky – Rust SQL engine with branches, replay, column lineage is analyzed by our AI as: A control plane for data warehouses (Databricks, Snowflake, BigQuery, DuckDB) that owns the DAG, providing Git-grade workflow (branches, replay), column-level lineage, first-class governance, cost attribution, compile-time portability, and schema-grounded AI, explicitly not a warehouse, Fivetran replacement, or dbt Cloud.. It focuses on Rocky addresses critical enterprise data governance, lineage, and cost management challenges within modern data warehouse ecosystems. By positionin...
Where did Rocky – Rust SQL engine with branches, replay, column lineage originate?
Data for Rocky – Rust SQL engine with branches, replay, column lineage was aggregated directly from the Hacker News community ecosystem, representing raw developer and early-adopter sentiment.
When was Rocky – Rust SQL engine with branches, replay, column lineage publicly launched?
The initial public indexing or launch date for Rocky – Rust SQL engine with branches, replay, column lineage within our tracked developer communities was recorded on April 29, 2026.
How popular is Rocky – Rust SQL engine with branches, replay, column lineage?
Rocky – Rust SQL engine with branches, replay, column lineage has achieved measurable traction, logging over 97 traction score and facilitating 31 recorded discussions or engagements.
Which technical categories define Rocky – Rust SQL engine with branches, replay, column lineage?
Based on metadata extraction, Rocky – Rust SQL engine with branches, replay, column lineage is categorized under topics such as: Rust SQL engine, branches, replay, column lineage.
Are there open-source alternatives related to Rocky – Rust SQL engine with branches, replay, column lineage?
Yes, the GitHub ecosystem contains correlated projects. For example, a repository named zerobootdev/zeroboot shares highly similar architectural descriptions and topics.
How does the creator describe Rocky – Rust SQL engine with branches, replay, column lineage?
The original author or development team describes the product as follows: "Hi HN, I'm Hugo. I've been building Rocky over the past month, shipping fast in the open. The binary is on GitHub Releases, `dagster-rocky` on PyPI, and the VS Code extension on the Marketplace. I ..."
Community Voice & Feedback