Most data stacks aren't broken. They're just unfocused.
range: "2026-04-28..2026-05-04"
salesforce.opportunity (avg 4.2hr, P95 9.1hr).
Cause is full-table extract on a 31M-row table.
Switching to CDC: lag drops to <8min, cost down 41%.
We rebuild the unsexy plumbing pipelines, warehouses, dashboards, cost models that drains your engineering hours and quietly burns your runway. Audit-first, production-grade, no offshore handoffs.
Your data stack looks fine.
It's not.
Most Series A-C SaaS teams come to us with the same shape of problem. Modeling layer is "fine". models/ folder is messy but it ships. Warehouse is "fine". Snowflake or BigQuery, somebody set it up two years ago. Dashboards are "fine". Looker or Metabase, half-trusted. The CFO trusts the numbers; the VP of Engineering doesn't.
Then one of three things happens:
Snowflake bill triples in a quarter
Nobody knows why. Engineering doesn't have time to investigate. CFO starts asking why.
A board metric breaks
ARR rolled up wrong because a join changed silently. Six weeks of trust evaporates in a single Slack thread.
A new feature needs data
Personalization, billing, an AI feature. The data exists; reaching it cleanly takes 4 weeks of plumbing nobody wants to write.
The hardest part of data engineering is not the engineering. It's deciding what's actually broken before you start fixing things.— From the D1 audit charter
Five productized engagements.
Pick one, or stack them.
Every engagement is fixed-price, fixed-scope, written in plain English before any code is written. About 70% of audits convert to a build engagement; the other 30% end with us recommending you don't build, and we mean it.
Two weeks in, you'll know exactly what's wrong and what it costs to fix.
— Who it's for
Heads of Data, VPs of Engineering, and CTOs who suspect the data stack is leaking money or trust — but can't get the engineering hours to confirm it. Most common trigger: Snowflake/BigQuery bill spiked, or a board metric broke.
— What we look at
- Warehouse architecture — Snowflake, BigQuery, Redshift, or Databricks
- Modeling layer — dbt, SQLMesh, raw SQL — whatever you've got
- Ingestion — Fivetran, Airbyte, custom, hybrid
- Orchestration — Airflow, Dagster, Prefect, cron
- BI layer — Looker, Metabase, Tableau, Mode, Hex
- Data observability — or lack of
- Cost breakdown — warehouse, ingestion, BI seats
- Team patterns — who fixes things at 2am, who knows where bodies are buried
— Deliverable: written, ~25 pages
- Executive summary — what we found, what to do, in 2 pages
- Architecture map — current state, with annotations
- Top 10 issues, ranked by impact and effort
- Cost model with projected savings per change
- Recommended scope for build (or recommendation against)
- Hiring or retainer recommendations if relevant
— Sample outcomes from past audits
- Identified $11K/mo of unused Snowflake compute → became a D3 engagement
- Found a silent join bug in revenue dashboard → 2-day fix, no D2 needed
- Recommended 1 in-house hire instead of an engagement → declined politely
- Surfaced 4 missing data contracts that broke pipelines weekly → became part of a D2 + D5
— Pricing notes
$4.5K for stacks under 50 dbt models, single warehouse, no AI/ML pipeline. $15K for multi-warehouse, mature dbt repo, custom ingestion, BI implementations spanning 4+ teams. Most audits land at $7.5K–$10K.
Production-grade data pipelines you didn't have to build yourself.
— Who it's for
Teams that need a new pipeline shipped to production — net-new source, net-new warehouse, net-new schema, or all three — and don't have a data engineer to spare for 3 months. Common shapes: replicating Salesforce + Stripe + product DB into a warehouse from scratch; rebuilding a brittle reverse-ETL flow; consolidating four legacy pipelines into one.
— What you get
- Source-to-warehouse ingestion (CDC where applicable, full-table where it makes sense)
- Modeling layer in dbt (or SQLMesh) with a clean folder structure
- Tests, freshness checks, alerts wired to Slack/PagerDuty
- Orchestration in Airflow / Dagster / Prefect — your choice; we have opinions but won't fight
- Documentation: architecture diagram, runbook, on-call playbook
- Two-hour walkthrough with your team
- 30-day post-launch warranty — we fix anything that breaks for free
— Stack we ship into
- Warehouses: Snowflake, BigQuery, Redshift, Databricks SQL
- Modeling: dbt Core, dbt Cloud, SQLMesh
- Ingestion: Fivetran, Airbyte, Stitch, custom Python (when paid connectors are absurd)
- Orchestration: Airflow (managed or MWAA), Dagster, Prefect
- Reverse-ETL: Hightouch, Census, custom
- Idempotent — re-running yesterday doesn't double-count
- Tested — every model has at least 3 tests (uniqueness, not-null, referential)
- Observable — freshness, row count anomalies, schema drift all alert
- Documented — a new engineer can run the system from the runbook
- Reproducible —
dbt buildfrom a clean clone produces the same output
— Sample timelines & pricing
- 8 weeks · $15K–$20K — single warehouse, 2–3 sources, 30–50 dbt models, 1 BI connection
- 12 weeks · $24K–$32K — multi-source (5–8), staging + intermediate + marts layers, observability, 2 BI connections
- 16 weeks · $34K–$40K — multi-warehouse migration, 100+ models, full data contracts, reverse-ETL
Most builds land at $22K–$30K.
Snowflake bill out of control?
We typically cut it 30–50%.
— Who it's for
Finance is asking why warehouse spend is up 80% YoY. Engineering doesn't have time to investigate. The answer is almost never "we need more compute" — it's almost always 4–8 specific things that are quietly burning money.
— What we look at
- Warehouse sizing — most teams are 1–2 sizes too big on every warehouse. We measure utilization and right-size.
- Query patterns — dashboards refreshing every 5 minutes that nobody opens. Same query family running 1000×/day with no cache. Joins that should be aggregations.
- Storage — abandoned databases, cloned databases nobody owns, time-travel windows nobody needs.
- Ingestion frequency — Fivetran syncs running every hour for tables nobody queries.
- Materialization strategy — incremental models running as full refresh because someone toggled it during debugging two months ago.
— Deliverable
- Cost-by-warehouse / cost-by-query / cost-by-team analysis
- Top 10 cost drivers ranked by savings vs. effort
- Implementation — we ship the changes, not just the recommendations
- Before/after cost comparison after 4 weeks of run-time
- Monitoring setup so it doesn't drift back
— What we don't promise
— Pricing notes
$12K for a focused single-warehouse engagement. $30K for multi-warehouse + ingestion + storage at scale. Most engagements land at $18K–$24K.
Dashboards your CFO and VP of Eng both trust.
— Who it's for
Teams where data exists, dashboards exist, and yet every leadership meeting starts with "is this number right?" The fix is rarely a new BI tool — it's a semantic layer, data contracts, and the discipline to make every metric defined exactly once.
— What we deliver
- Semantic layer — dbt Semantic Layer, Cube, LookML, or metric tree. Every metric defined once, used everywhere.
- Metric catalog with owner, definition, formula, refresh cadence
- 12–25 production dashboards (depending on engagement size) covering exec, finance, sales, customer success, product
- Alerting on anomalies — spend spikes, signup drops, etc.
- Self-serve query environment for analyst-level users (Hex / Mode / Looker Explores)
- Training: two 90-min sessions with your team
— Stack we ship into
- Semantic layer: dbt Semantic Layer (preferred), Cube, LookML, MetricFlow
- BI tools: Looker, Metabase (excellent if budget-constrained), Mode, Hex, Tableau, Sigma
- Alerting: native BI alerts → Slack; for advanced cases, Anomalo or custom
We don't have a financial relationship with any BI vendor. Our default recommendation by team shape:
- <50 employees · modest budget
- Metabase Pro. Underrated, ships in 2 weeks.
- Sales / CS-heavy · want self-serve
- Looker. Worth the cost for governance.
- Data-team-led · lots of analysts
- Hex or Mode + dbt Semantic Layer.
- CFO-heavy · board-facing
- Sigma. The Excel-shaped audience loves it.
— Sample timelines & pricing
- 10 weeks · $20K–$28K — 1 BI tool, semantic layer on existing dbt, 12 dashboards
- 14 weeks · $36K–$50K — semantic layer rebuild, 2 BI tools (analyst + exec), 25 dashboards, training program
Most engagements land at $28K–$36K.
We run it — or your team does.
Either is fine.
— Who it's for
Teams who've shipped a system with us (D2 / D3 / D4) and want ongoing operation without hiring a full-time data engineer. Or teams with an existing data engineer who needs senior backup, design review, on-call coverage, or a sounding board.
— What's included (scaled by tier)
- $5K/mo · Operate-only — monitoring, alerting on incidents, monthly stack-health report, async response within 1 business day
- $8K/mo · Operate + small builds — above + 1 day/week of new build work (small features, new sources, dashboard additions)
- $12K/mo · Operate + lead engineer — above + senior engineer in your Slack daily, design review, on-call coverage during business hours, 2 days/week build capacity
- $15K/mo · Embedded team — above + dedicated 0.5 FTE coverage, architecture council, quarterly review with a partner
— What's NOT included
- 24/7 on-call — we'll refer you to specialist on-call firms if you need this
- Major migrations — those are D2 engagements, separately scoped
- Pure body-shop hours — we don't do staff augmentation
— Pricing notes
6-month minimum on $5K/$8K tiers. 3-month minimum on $12K/$15K. 30-day notice to cancel after the minimum. No multi-year contracts.
We're stack-agnostic but opinionated. We don't take referral fees from any vendor. The recommendation is on what's right for your workload.— Stack philosophy · Section 10
Tools we ship into.
And the ones we don't.
We're stack-agnostic but opinionated. We don't take referral fees from any vendor. Our defaults below — but if your team has standardized on something, we adopt it.
- Snowflake — default for B2B SaaS. Best dev experience, costliest if not tuned.
- BigQuery — preferred when GCP is the cloud of record.
- Databricks — when you have ML / Spark workloads to colocate.
- Redshift — supported, increasingly rare in new builds.
- Postgres / Aurora — fine for <500GB analytics workloads.
- dbt Core — default. Open source, opinionated, ships fast.
- dbt Cloud — for teams who want managed orchestration in one tool.
- SQLMesh — when virtual environments and column-level lineage matter.
- Fivetran — default for SaaS sources.
- Airbyte — when self-host is mandated or Fivetran is too expensive.
- Custom Python (Dagster assets / dbt Python) — for niche sources, real-time CDC.
- Hightouch / Census — for reverse-ETL.
- Airflow — default. Boring, reliable, hireable.
- Dagster — when asset-based reasoning matters.
- Prefect — supported but not our default.
- Looker — sales / CS / governance-heavy teams.
- Metabase — modest budget, fast iteration, surprisingly capable.
- Hex / Mode — analyst-heavy teams.
- Tableau — supported, increasingly rare in new builds.
- Sigma — CFO-heavy / Excel-shaped audiences.
- dbt tests — table stakes.
- Elementary — open-source dbt observability. Default.
- Monte Carlo / Anomalo — when you've outgrown Elementary.
- Datadog / Grafana — for warehouse and pipeline metrics.
- Real-time streaming (Kafka, Flink, Spark Streaming) — we'll refer you to specialists.
- Master data management platforms (Reltio, Informatica MDM) — out of scope.
- Hadoop / Hive — we'll politely decline.
What a D1 audit
actually looks like.
Every D1 audit produces a written, 20–30 page document. Below is the structure of a real audit (anonymized) we delivered to a Series B B2B SaaS team in early 2025. The full document was 27 pages.
DATA STACK AUDIT
[REDACTED] Inc.
- 01 · Executive Summaryp.02
- 02 · Audit Scope & Methodologyp.04
- 03 · Current State — Architecture Mapp.06
- 04 · Current State — Cost Breakdownp.10
- 05 · Findings — Top 10 Issuesp.13
- 06 · Findings — Detail (per issue)p.16
- 07 · Recommended Scopep.22
- 08 · Implementation Phasingp.24
- 09 · Hiring & Retainer Recommendationsp.26
- 10 · Appendix · Query-Level Cost Analysisp.27
[REDACTED]'s data stack is functional but expensive and increasingly fragile. Three findings dominate:
Snowflake spend is 47% above expected baseline — driven by 3 oversized warehouses and 11 query families that account for 72% of total compute.
The dbt project (217 models) has accumulated 2.5 years of organic growth without refactor. Lineage is opaque; ~30% of models are unused in production but still scheduled.
Reverse-ETL to Salesforce is hand-rolled Python with no monitoring. Three silent failures in the last 60 days, all caught by sales reps complaining.
Recommended scope: D3 (Cost Optimization, $24K) + D2 (dbt refactor + reverse-ETL replacement, $32K). Total: $56K, 14 weeks, projected first-year ROI 3.4×.
→ Want a redacted full sample? Email hello@gigafloptechlab.com with subject "D1 sample" and we'll send a real audit.
38% off the Snowflake bill
in 6 weeks.
— The setup
A Series B B2B SaaS company (~$15M ARR, 80 employees, GCP-hybrid stack) noticed Snowflake spend had doubled in three quarters with no corresponding doubling of users or data volume. The CFO had stopped approving new BI tool requests. The VP of Engineering was being asked to "look into it" — but couldn't spare 3 engineering weeks. They engaged us for a D1 audit.
— What we found in week 2 (the audit)
- 3 warehouses sized XL when M was sufficient. Spun up during a Black Friday surge in 2023, never resized down. ~$3,200/mo waste.
- A "real-time" dashboard refreshing every 5 minutes — viewed an average of 4 times per week. ~$1,800/mo.
- A dbt model running incrementally — except the incremental key was wrong, so it was full-refreshing 12M rows every hour. ~$2,400/mo.
- Snowpipe streaming for a table queried once per day. ~$700/mo.
- Three abandoned databases from a 2024 reorg still incurring storage cost. ~$400/mo.
- 9 more findings at the $50–$300/mo range, totaling another ~$1,800/mo.
— What we did in weeks 3–8 (the build)
- Right-sized warehouses with auto-suspend tuned to actual usage patterns
- Replaced the "real-time" dashboard with hourly refresh + alerting on actual change
- Fixed the incremental key on the dbt model — 1 line of YAML; saved 2 hours of compute per day
- Rewrote the Snowpipe stream as a daily batch
- Retired the abandoned databases (with 30-day archive)
- Set up cost monitoring in Snowflake's cost insights + a weekly Slack summary
- Wrote a "before you spin up a warehouse" playbook for their data team
— The handoff
Two-hour walkthrough with their data engineer. Runbook for cost monitoring. We checked in at 30 days (savings holding) and at 90 days (savings holding, no regressions). Engagement closed.
The audit paid for itself in week 6. The build paid for itself in month 4. We're still on the savings rate 9 months later.— VP of Engineering · [Redacted Series B SaaS]
When Gigaflop is right —
and when it isn't.
Here's the honest comparison. We've turned down engagements when one of the other paths was clearly better. If you're not sure which column you're in, the discovery call is free.
| Approach | Cost | Time-to-value | Lock-in | Best fit |
|---|---|---|---|---|
| In-house data team | $250K–$500K / yr (loaded) |
6–12 months to first ship | None | Recurring scale, $10M+ analytics budget |
| Freelance contractor | $80–$200 / hr | 4–8 weeks | High — single-person dependency | One-off projects, no production criticality |
| Big 4 / SI firm | $300K–$1M+ | 6–9 months | High — vendor methodology | Massive enterprise, regulated industries |
| — Boutique (us) | $5K–$50K per engagement | 2–4 months | None — you own everything | Series A–C SaaS, $5M–$50M ARR |
— Honest read
- If you've got the budget and the timeline to hire 2 senior data engineers and wait 9 months for them to ship — do that. It's the right long-term play.
- If you need a small, well-defined thing done by next quarter — freelance. Lots of excellent people on Toptal, A.Team, Continuum.
- If you're a regulated enterprise needing 200+ consultants on a 2-year program — Big 4. We'd refer you to Slalom, Credera, or West Monroe.
- If you're a Series A–C SaaS team with a real problem and a 60–90 day window — that's us.
Terms we'll use in the audit.
Buyers usually have a head of data who knows these. Some buyers (CTOs, CFOs, founders) don't. This is for them. Skip if it's beneath you.
- cdc · change data capture
- Replicating only what changed in a source database, instead of re-pulling the whole table. Cheaper, faster, harder to set up correctly.
- data contract
- A schema-level agreement between a service that emits data and the team that consumes it. Prevents silent breakage.
- data lake / lakehouse
- Storage of raw, often unstructured data, with compute layered on top. Databricks is the canonical lakehouse.
- data mart
- A focused subset of a warehouse designed for one team or use case. (Sales mart, Finance mart, etc.)
- dbt
- The de facto standard for SQL-based data transformation. Lets analytics engineers write modular, tested SQL.
- elt vs etl
- Order of operations. ETL transforms before loading; ELT loads raw, transforms in the warehouse. ELT has won for analytics.
- incremental model
- A dbt model that only processes new rows since last run. Cheaper than full refresh, easy to break.
- lineage
- The map of "this column came from that column came from this raw source." Critical for trust and debugging.
- olap / oltp
- OLAP = analytical (warehouses, dashboards). OLTP = transactional (your app database). Different shapes, different tools.
- reverse-etl
- Pushing warehouse data back into operational tools (Salesforce, HubSpot, Marketo). Hightouch and Census are the leaders.
- semantic layer
- A central definition of what each business metric means, so two dashboards can't disagree on "revenue".
- snowflake credit
- The unit of compute billing in Snowflake. ~$2–$4 per credit depending on contract.
- star schema
- The classic warehouse modeling pattern: fact tables in the center, dimension tables around them.
- warehouse
- The compute + storage system that holds your analytics data. Snowflake, BigQuery, Redshift, Databricks.
Things data buyers usually ask.
Q.01We already use Fivetran and dbt. Do you replace them or work with them?+
Q.02How quickly will I see the savings on a D3 cost optimization?+
Q.03Will you sign an NDA, MSA, and DPA?+
Q.04Who actually does the data engineering work?+
Q.05We've already had a Big 4 firm audit our stack. Do we need yours?+
Q.06Do you do data science / ML / AI?+
Q.07Can you help us hire instead of engage you?+
Q.08What happens if the engagement runs over?+
Q.09Do you offer a satisfaction guarantee?+
Q.10Do you take equity?+
Most engagements start
with the audit.
Two weeks. Written deliverable. Either becomes the scope for a build, or it's the only thing we do — your call. About 70% of audits convert to a build engagement.
Book a 30-min discovery call → hello@gigafloptechlab.com