← Back to AI Insights
Gemini Executive Synthesis

Rocky, a Rust-based control plane for warehouse pipelines.

Technical Positioning
A control plane for data warehouses (Databricks, Snowflake, BigQuery, DuckDB) that owns the DAG, providing Git-grade workflow (branches, replay), column-level lineage, first-class governance, cost attribution, compile-time portability, and schema-grounded AI, explicitly not a warehouse, Fivetran replacement, or dbt Cloud.
SaaS Insight & Market Implications
Rocky addresses critical enterprise data governance, lineage, and cost management challenges within modern data warehouse ecosystems. By positioning itself as a 'control plane' that 'owns the DAG,' it fills a gap where existing warehouses fall short. Features like 'Git-grade workflow' with branches and replay, 'column-level lineage from the compiler,' and 'governance as a first-class surface' are essential for regulated industries and large data operations. The 'cost attribution' and 'schema-grounded AI' further enhance operational efficiency and reliability. Rocky's modular approach, integrating with major warehouses while avoiding feature overlap with Fivetran or dbt Cloud, positions it as a crucial, specialized layer for robust, compliant, and cost-optimized data pipeline management.
Proprietary Technical Taxonomy
Rust SQL engine branches replay column lineage control plane warehouse pipelines DAG dependencies

Raw Developer Origin & Technical Request

Source Icon Hacker News Apr 29, 2026
Show HN: Rocky – Rust SQL engine with branches, replay, column lineage

Hi HN, I'm Hugo. I've been building Rocky over the past month, shipping fast in the open. The binary is on GitHub Releases, `dagster-rocky` on PyPI, and the VS Code extension on the Marketplace. I held off on a broader announcement until the trust-system surface was coherent enough to talk about as one thing. The governance waveplan — column classification, per-env masking, 8-field audit trail on every run, `rocky compliance` rollup, role-graph reconciliation, retention policies — landed end-to-end last week in engine-v1.16.0 and rounded out in v1.17.4 (tagged 2026-04-26). That's the milestone I'd been waiting for.The pitch: keep Databricks or Snowflake. Bring Rocky for the DAG. Rocky is a Rust-based control plane for warehouse pipelines. Storage and compute stay with your warehouse. Rocky owns the graph — dependencies, compile-time types, drift, incremental logic, cost, lineage, governance. The things your current stack can't give you because it doesn't own the DAG.A few things I think are interesting:- Branches + replay. `rocky branch create stg` gives you a logical copy of a pipeline's tables (schema-prefix today; native Delta SHALLOW CLONE and Snowflake zero-copy are next). `rocky replay ` reconstructs which SQL ran against which inputs. Git-grade workflow on a warehouse.- Column-level lineage from the compiler, not a post-hoc graph crawl. The type checker traces columns through joins, CTEs, and windows. VS Code surfaces it inline via LSP.- Governance as a first-class surface. Column classification tags plus per-env masking policies, applied to the warehouse via Unity Catalog (Databricks) or masking policies (Snowflake). 8-field audit trail on every run. `rocky compliance` rollup that CI can gate on. Role-graph reconciliation via SCIM + per-catalog GRANT. Retention policies with a warehouse-side drift probe.- Cost attribution. Every run produces per-model cost (bytes, duration). `[budget]` blocks in `rocky.toml`; breaches fire a `budget_breach` hook event.- Compile-time portability + blast radius. Dialect-divergence lint across Databricks / Snowflake / BigQuery / DuckDB (12 constructs). `SELECT *` downstream-impact lint.- Schema-grounded AI. Generated SQL goes through the compiler — AI suggestions type-check before they can land.What Rocky isn't:- Not a warehouse — it's the control plane on top.- Not a Fivetran replacement. `rocky load` handles files (CSV/Parquet/JSONL); for SaaS sources use Fivetran, Airbyte, or warehouse-native CDC.- Not dbt Cloud — no hosted UI, no managed scheduler. First-class Dagster integration if you need orchestration.Adapters: Databricks (GA), Snowflake (Beta), BigQuery (Beta), DuckDB (local dev / playground). Apache 2.0.I'd love feedback on the trust-system framing, the governance surface (particularly classification-to-masking resolution in `rocky compile` and the `rocky compliance` CI gate), the branches/replay design, the cost-attribution primitives, or anything else that catches your eye. Happy to go deep in the thread.

Developer Debate & Comments

Dorrell • Apr 29, 2026
[dead]
Dorrell • Apr 29, 2026
[dead]
ahmad212o • Apr 29, 2026
[dead]
data_ders • Apr 29, 2026
hiya, anders from dbt here. cool project -- I especially love the branching and budgeting options you've built in. both are things that I'd love for the dbt standard to include one day. was it dbt's lack of those feature that inspired you to start this project? It also seems you have an aversion to Jinja, which, believe me, I get!FYI dbt-fusion [1] is going GA next week (though GA for Databricks will come later) Most of it is source-available and ELv2-licensed, but there's a number of crates that are Apache 2.0, namely: dbt-xdbc, dbt-adapter, dbt-auth, dbt-jinja, dbt-agate. We also have plans to OSS more as time goes on (stay tuned).I just wanted to call out the OSS crates in case you'd rather focus on "making your beer taste better" than have to re-build foundations. I'd love to hear if any of those crates come in handy for you (even more so if they don't work for you).Feel free to reach out on LinkedIn or dbt community Slack if you ever want to chat more![1]: https://github.com/dbt-labs/dbt-fusion
Xiaoher-C • Apr 29, 2026
The compile-time lineage part is the most interesting bit to me. A lot of “data lineage” tools feel like archaeology after the fact: parse logs, reconstruct what probably happened, then hope it matches reality.Having the compiler know “this column flows into these downstream models” before execution changes the workflow quite a bit. It makes refactors and masking policies much less scary.Do you expose any kind of “lineage diff” between branches? For example: this PR changes the downstream impact of `customer.email` from A/B/C to A/B/D. That would be useful in code review.
PeterWhittaker • Apr 29, 2026
Congrats on the work, but have you considered another name? Naming is hard and always will be: When I first scanned the headline, my initial thought was "that's an interesting area for the Rocky Linux team to explore". After a moment, "wait, no, that's confusing, it's some other Rocky".
ramon156 • Apr 29, 2026
If your introduction message already includes a bunch of uncurated claims and LLM smells, then what does that say about the code I'm about to run?
mollerhoj • Apr 29, 2026
Its a bit confusing to claim that "The things your current stack can't give you because it doesn't own the DAG" and use DataBricks as your example: DataBricks includes jobs and pipelines, so it very much owns the DAG, no?
hasyimibhar • Apr 29, 2026
Looks cool, I've been waiting for someone to build this since dbt and SQLMesh acquisition. It would be great to have model versioning and support for ClickHouse SQL.
mergisi • Apr 29, 2026
* * *

Frequently Asked Questions

Market intelligence mapped to Rocky, a Rust-based control plane for warehouse pipelines..

What problem does Rocky, a Rust-based control plane for warehouse pipelines. solve?
Based on our AI analysis of the original developer request, its primary technical positioning is: A control plane for data warehouses (Databricks, Snowflake, BigQuery, DuckDB) that owns the DAG, providing Git-grade workflow (branches, replay), column-level lineage, first-class governance, cost attribution, compile-time portability, and schema-grounded AI, explicitly not a warehouse, Fivetran replacement, or dbt Cloud.
How is the developer community reacting to Rocky, a Rust-based control plane for warehouse pipelines.?
Yes, we have tracked 31 direct responses and active debates regarding this specific topic originating from Hacker News.
What are the foundational technologies related to Rocky, a Rust-based control plane for warehouse pipelines.?
Our proprietary extraction maps Rocky, a Rust-based control plane for warehouse pipelines. to adjacent architectural concepts including Rust SQL engine, branches, replay, column lineage.
Are developers creating tools for Rocky, a Rust-based control plane for warehouse pipelines.?
Yes, open-source adoption is correlated. An active project titled 'Infatoshi/OpenSquirrel' explores similar frameworks: For people who get distracted by agents. A native Rust/GPUI control plane for running Claude Code, Codex, Cursor, and OpenCode side by side — becau...

Engagement Signals

97
Upvotes
31
Comments

Cross-Market Term Frequency

Quantifies the cross-market adoption of foundational terms like cost and dependencies by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.