01 / 16 — DATA — PRACTICE PAGE
Data Engineering — Est. 2012
Rev. 2026.05 · v3.0

Most data stacks aren't broken. They're just unfocused.

gigaflop · data-warehouse-cli
USER
Show me last week's pipeline lag by source.
tool_call data_observability.lag_metrics
   range: "2026-04-28..2026-05-04"
DATA AGENT
4 of 23 sources >2hr lag. The biggest offender:
salesforce.opportunity (avg 4.2hr, P95 9.1hr).
Cause is full-table extract on a 31M-row table.
Switching to CDC: lag drops to <8min, cost down 41%.
Production observability · D5 retainer · live

We rebuild the unsexy plumbing pipelines, warehouses, dashboards, cost models that drains your engineering hours and quietly burns your runway. Audit-first, production-grade, no offshore handoffs.

Founded 2012 · 14 yrs depth 200+ systems shipped Series A → IPO clients US · UK · EU · APAC
001 / Engineering depth
14 yrs
Data engineering through DiscoverWebTech
002 / Active retention
90%
Year-over-year on data retainer accounts
003 / Pipelines in prod
200+
Across Snowflake, BigQuery, Redshift, Databricks
004 / Avg cost reduction
−38%
Typical first-quarter savings on optimization
Section03
— Problem · what's usually broken

Your data stack looks fine.
It's not.

3 triggers ↘

Most Series A-C SaaS teams come to us with the same shape of problem. Modeling layer is "fine". models/ folder is messy but it ships. Warehouse is "fine". Snowflake or BigQuery, somebody set it up two years ago. Dashboards are "fine". Looker or Metabase, half-trusted. The CFO trusts the numbers; the VP of Engineering doesn't.

Then one of three things happens:

— The three triggers we see most often
— 01

Snowflake bill triples in a quarter

Nobody knows why. Engineering doesn't have time to investigate. CFO starts asking why.

— 02

A board metric breaks

ARR rolled up wrong because a join changed silently. Six weeks of trust evaporates in a single Slack thread.

— 03

A new feature needs data

Personalization, billing, an AI feature. The data exists; reaching it cleanly takes 4 weeks of plumbing nobody wants to write.

We exist to fix these. Not as a strategy deck — as production code. Audit-led, milestone-paced, no offshore handoffs.
The hardest part of data engineering is not the engineering. It's deciding what's actually broken before you start fixing things.
— From the D1 audit charter
Section05
— D1 · Data Audit

Two weeks in, you'll know exactly what's wrong and what it costs to fix.

Diagnostic · 4–6 wks
$4.5K–$15K ↘

— Who it's for

Heads of Data, VPs of Engineering, and CTOs who suspect the data stack is leaking money or trust — but can't get the engineering hours to confirm it. Most common trigger: Snowflake/BigQuery bill spiked, or a board metric broke.

— What we look at

  • Warehouse architecture — Snowflake, BigQuery, Redshift, or Databricks
  • Modeling layer — dbt, SQLMesh, raw SQL — whatever you've got
  • Ingestion — Fivetran, Airbyte, custom, hybrid
  • Orchestration — Airflow, Dagster, Prefect, cron
  • BI layer — Looker, Metabase, Tableau, Mode, Hex
  • Data observability — or lack of
  • Cost breakdown — warehouse, ingestion, BI seats
  • Team patterns — who fixes things at 2am, who knows where bodies are buried

— Deliverable: written, ~25 pages

  • Executive summary — what we found, what to do, in 2 pages
  • Architecture map — current state, with annotations
  • Top 10 issues, ranked by impact and effort
  • Cost model with projected savings per change
  • Recommended scope for build (or recommendation against)
  • Hiring or retainer recommendations if relevant
— Audit conversion About 70% of D1 audits convert to a D2/D3/D4 build. The other 30% end with us saying "your stack is fine, here's a smaller fix" or "you don't need this — here's what to do instead." We'd rather lose the engagement than push a project that shouldn't ship.

— Sample outcomes from past audits

  • Identified $11K/mo of unused Snowflake compute → became a D3 engagement
  • Found a silent join bug in revenue dashboard → 2-day fix, no D2 needed
  • Recommended 1 in-house hire instead of an engagement → declined politely
  • Surfaced 4 missing data contracts that broke pipelines weekly → became part of a D2 + D5

— Pricing notes

$4.5K for stacks under 50 dbt models, single warehouse, no AI/ML pipeline. $15K for multi-warehouse, mature dbt repo, custom ingestion, BI implementations spanning 4+ teams. Most audits land at $7.5K–$10K.

Section06
— D2 · Pipeline Build

Production-grade data pipelines you didn't have to build yourself.

Production · 8–16 wks
$15K–$40K ↘

— Who it's for

Teams that need a new pipeline shipped to production — net-new source, net-new warehouse, net-new schema, or all three — and don't have a data engineer to spare for 3 months. Common shapes: replicating Salesforce + Stripe + product DB into a warehouse from scratch; rebuilding a brittle reverse-ETL flow; consolidating four legacy pipelines into one.

— What you get

  • Source-to-warehouse ingestion (CDC where applicable, full-table where it makes sense)
  • Modeling layer in dbt (or SQLMesh) with a clean folder structure
  • Tests, freshness checks, alerts wired to Slack/PagerDuty
  • Orchestration in Airflow / Dagster / Prefect — your choice; we have opinions but won't fight
  • Documentation: architecture diagram, runbook, on-call playbook
  • Two-hour walkthrough with your team
  • 30-day post-launch warranty — we fix anything that breaks for free

— Stack we ship into

  • Warehouses: Snowflake, BigQuery, Redshift, Databricks SQL
  • Modeling: dbt Core, dbt Cloud, SQLMesh
  • Ingestion: Fivetran, Airbyte, Stitch, custom Python (when paid connectors are absurd)
  • Orchestration: Airflow (managed or MWAA), Dagster, Prefect
  • Reverse-ETL: Hightouch, Census, custom
— What "production-grade" actually means here
  • Idempotent — re-running yesterday doesn't double-count
  • Tested — every model has at least 3 tests (uniqueness, not-null, referential)
  • Observable — freshness, row count anomalies, schema drift all alert
  • Documented — a new engineer can run the system from the runbook
  • Reproducibledbt build from a clean clone produces the same output

— Sample timelines & pricing

  • 8 weeks · $15K–$20K — single warehouse, 2–3 sources, 30–50 dbt models, 1 BI connection
  • 12 weeks · $24K–$32K — multi-source (5–8), staging + intermediate + marts layers, observability, 2 BI connections
  • 16 weeks · $34K–$40K — multi-warehouse migration, 100+ models, full data contracts, reverse-ETL

Most builds land at $22K–$30K.

Section07
— D3 · Cost Optimization

Snowflake bill out of control?
We typically cut it 30–50%.

Reduction · 6–8 wks
$12K–$30K ↘

— Who it's for

Finance is asking why warehouse spend is up 80% YoY. Engineering doesn't have time to investigate. The answer is almost never "we need more compute" — it's almost always 4–8 specific things that are quietly burning money.

— What we look at

  • Warehouse sizing — most teams are 1–2 sizes too big on every warehouse. We measure utilization and right-size.
  • Query patterns — dashboards refreshing every 5 minutes that nobody opens. Same query family running 1000×/day with no cache. Joins that should be aggregations.
  • Storage — abandoned databases, cloned databases nobody owns, time-travel windows nobody needs.
  • Ingestion frequency — Fivetran syncs running every hour for tables nobody queries.
  • Materialization strategy — incremental models running as full refresh because someone toggled it during debugging two months ago.

— Deliverable

  • Cost-by-warehouse / cost-by-query / cost-by-team analysis
  • Top 10 cost drivers ranked by savings vs. effort
  • Implementation — we ship the changes, not just the recommendations
  • Before/after cost comparison after 4 weeks of run-time
  • Monitoring setup so it doesn't drift back
— Featured outcome · CASE/02 −38% Snowflake spend in 6 weeks. Series B B2B SaaS, ~$15M ARR. 4 oversized warehouses, 12 inefficient queries, 3 dashboards refreshing every 5 minutes that nobody used. Annualized savings: $132K. Our fee: $24K. ROI on engagement: ~5.5× in year one. See full case study in section 12 →

— What we don't promise

— Honesty block We won't quote a savings percentage before the audit. The honest range is 15–60% depending on how mature the stack is. If your stack is already lean (we'll know in week one), we'll tell you and refund the un-spent portion. We've done that 3 times.

— Pricing notes

$12K for a focused single-warehouse engagement. $30K for multi-warehouse + ingestion + storage at scale. Most engagements land at $18K–$24K.

Section08
— D4 · BI Implementation

Dashboards your CFO and VP of Eng both trust.

Trust · 10–14 wks
$20K–$50K ↘

— Who it's for

Teams where data exists, dashboards exist, and yet every leadership meeting starts with "is this number right?" The fix is rarely a new BI tool — it's a semantic layer, data contracts, and the discipline to make every metric defined exactly once.

— What we deliver

  • Semantic layer — dbt Semantic Layer, Cube, LookML, or metric tree. Every metric defined once, used everywhere.
  • Metric catalog with owner, definition, formula, refresh cadence
  • 12–25 production dashboards (depending on engagement size) covering exec, finance, sales, customer success, product
  • Alerting on anomalies — spend spikes, signup drops, etc.
  • Self-serve query environment for analyst-level users (Hex / Mode / Looker Explores)
  • Training: two 90-min sessions with your team

— Stack we ship into

  • Semantic layer: dbt Semantic Layer (preferred), Cube, LookML, MetricFlow
  • BI tools: Looker, Metabase (excellent if budget-constrained), Mode, Hex, Tableau, Sigma
  • Alerting: native BI alerts → Slack; for advanced cases, Anomalo or custom
— BI tool we recommend, by team shape

We don't have a financial relationship with any BI vendor. Our default recommendation by team shape:

<50 employees · modest budget
Metabase Pro. Underrated, ships in 2 weeks.
Sales / CS-heavy · want self-serve
Looker. Worth the cost for governance.
Data-team-led · lots of analysts
Hex or Mode + dbt Semantic Layer.
CFO-heavy · board-facing
Sigma. The Excel-shaped audience loves it.

— Sample timelines & pricing

  • 10 weeks · $20K–$28K — 1 BI tool, semantic layer on existing dbt, 12 dashboards
  • 14 weeks · $36K–$50K — semantic layer rebuild, 2 BI tools (analyst + exec), 25 dashboards, training program

Most engagements land at $28K–$36K.

Section09
— D5 · Data Retainer

We run it — or your team does.
Either is fine.

Operate · Monthly
$5K–$15K MRR ↘

— Who it's for

Teams who've shipped a system with us (D2 / D3 / D4) and want ongoing operation without hiring a full-time data engineer. Or teams with an existing data engineer who needs senior backup, design review, on-call coverage, or a sounding board.

— What's included (scaled by tier)

  • $5K/mo · Operate-only — monitoring, alerting on incidents, monthly stack-health report, async response within 1 business day
  • $8K/mo · Operate + small builds — above + 1 day/week of new build work (small features, new sources, dashboard additions)
  • $12K/mo · Operate + lead engineer — above + senior engineer in your Slack daily, design review, on-call coverage during business hours, 2 days/week build capacity
  • $15K/mo · Embedded team — above + dedicated 0.5 FTE coverage, architecture council, quarterly review with a partner

— What's NOT included

  • 24/7 on-call — we'll refer you to specialist on-call firms if you need this
  • Major migrations — those are D2 engagements, separately scoped
  • Pure body-shop hours — we don't do staff augmentation
— When retainer doesn't make sense A retainer is the wrong shape if: (a) you're hiring a data engineer in the next 3 months — wait, save the budget; (b) your data work is genuinely one-off — use D1 → D2 instead; (c) your stack changes weekly because product is in heavy pivot — too chaotic for retainer rhythm. We've turned down retainer engagements for all three reasons.

— Pricing notes

6-month minimum on $5K/$8K tiers. 3-month minimum on $12K/$15K. 30-day notice to cancel after the minimum. No multi-year contracts.

We're stack-agnostic but opinionated. We don't take referral fees from any vendor. The recommendation is on what's right for your workload.
— Stack philosophy · Section 10
Section10
— Stack — tools we ship into

Tools we ship into.
And the ones we don't.

No referral fees ·
your stack overrides ↘

We're stack-agnostic but opinionated. We don't take referral fees from any vendor. Our defaults below — but if your team has standardized on something, we adopt it.

— Warehouses & lakes
  • Snowflakedefault for B2B SaaS. Best dev experience, costliest if not tuned.
  • BigQuerypreferred when GCP is the cloud of record.
  • Databrickswhen you have ML / Spark workloads to colocate.
  • Redshiftsupported, increasingly rare in new builds.
  • Postgres / Aurorafine for <500GB analytics workloads.
— Modeling & transformation
  • dbt Coredefault. Open source, opinionated, ships fast.
  • dbt Cloudfor teams who want managed orchestration in one tool.
  • SQLMeshwhen virtual environments and column-level lineage matter.
— Ingestion
  • Fivetrandefault for SaaS sources.
  • Airbytewhen self-host is mandated or Fivetran is too expensive.
  • Custom Python (Dagster assets / dbt Python)for niche sources, real-time CDC.
  • Hightouch / Censusfor reverse-ETL.
— Orchestration
  • Airflowdefault. Boring, reliable, hireable.
  • Dagsterwhen asset-based reasoning matters.
  • Prefectsupported but not our default.
— BI & analytics
  • Lookersales / CS / governance-heavy teams.
  • Metabasemodest budget, fast iteration, surprisingly capable.
  • Hex / Modeanalyst-heavy teams.
  • Tableausupported, increasingly rare in new builds.
  • SigmaCFO-heavy / Excel-shaped audiences.
— Observability & quality
  • dbt teststable stakes.
  • Elementaryopen-source dbt observability. Default.
  • Monte Carlo / Anomalowhen you've outgrown Elementary.
  • Datadog / Grafanafor warehouse and pipeline metrics.
— What we don't do
  • Real-time streaming (Kafka, Flink, Spark Streaming)we'll refer you to specialists.
  • Master data management platforms (Reltio, Informatica MDM)out of scope.
  • Hadoop / Hivewe'll politely decline.
Section11
— Sample D1 deliverable

What a D1 audit
actually looks like.

Written · ~25 pages
Real engagement, redacted ↘

Every D1 audit produces a written, 20–30 page document. Below is the structure of a real audit (anonymized) we delivered to a Series B B2B SaaS team in early 2025. The full document was 27 pages.

— Document · Confidential · Prepared for client

DATA STACK AUDIT
[REDACTED] Inc.

Prepared by Gigaflop Techlab · 2025-01-22 · 27 pages
— Table of Contents
  • 01 · Executive Summaryp.02
  • 02 · Audit Scope & Methodologyp.04
  • 03 · Current State — Architecture Mapp.06
  • 04 · Current State — Cost Breakdownp.10
  • 05 · Findings — Top 10 Issuesp.13
  • 06 · Findings — Detail (per issue)p.16
  • 07 · Recommended Scopep.22
  • 08 · Implementation Phasingp.24
  • 09 · Hiring & Retainer Recommendationsp.26
  • 10 · Appendix · Query-Level Cost Analysisp.27
— Executive Summary · excerpt

[REDACTED]'s data stack is functional but expensive and increasingly fragile. Three findings dominate:

Snowflake spend is 47% above expected baseline — driven by 3 oversized warehouses and 11 query families that account for 72% of total compute.

The dbt project (217 models) has accumulated 2.5 years of organic growth without refactor. Lineage is opaque; ~30% of models are unused in production but still scheduled.

Reverse-ETL to Salesforce is hand-rolled Python with no monitoring. Three silent failures in the last 60 days, all caught by sales reps complaining.

Recommended scope: D3 (Cost Optimization, $24K) + D2 (dbt refactor + reverse-ETL replacement, $32K). Total: $56K, 14 weeks, projected first-year ROI 3.4×.

→ Want a redacted full sample? Email hello@gigafloptechlab.com with subject "D1 sample" and we'll send a real audit.

Section12
— Record · CASE/02 expanded

38% off the Snowflake bill
in 6 weeks.

D1 → D3 · 8 weeks total
Series B B2B SaaS ↘
Spend reduction
−38%
$11K/mo → $6.8K/mo, sustained
Annualized savings
$132K
First-year run rate
Engagement fee
$24K
D1 ($8K) + D3 ($16K)

— The setup

A Series B B2B SaaS company (~$15M ARR, 80 employees, GCP-hybrid stack) noticed Snowflake spend had doubled in three quarters with no corresponding doubling of users or data volume. The CFO had stopped approving new BI tool requests. The VP of Engineering was being asked to "look into it" — but couldn't spare 3 engineering weeks. They engaged us for a D1 audit.

— What we found in week 2 (the audit)

  • 3 warehouses sized XL when M was sufficient. Spun up during a Black Friday surge in 2023, never resized down. ~$3,200/mo waste.
  • A "real-time" dashboard refreshing every 5 minutes — viewed an average of 4 times per week. ~$1,800/mo.
  • A dbt model running incrementally — except the incremental key was wrong, so it was full-refreshing 12M rows every hour. ~$2,400/mo.
  • Snowpipe streaming for a table queried once per day. ~$700/mo.
  • Three abandoned databases from a 2024 reorg still incurring storage cost. ~$400/mo.
  • 9 more findings at the $50–$300/mo range, totaling another ~$1,800/mo.

— What we did in weeks 3–8 (the build)

  • Right-sized warehouses with auto-suspend tuned to actual usage patterns
  • Replaced the "real-time" dashboard with hourly refresh + alerting on actual change
  • Fixed the incremental key on the dbt model — 1 line of YAML; saved 2 hours of compute per day
  • Rewrote the Snowpipe stream as a daily batch
  • Retired the abandoned databases (with 30-day archive)
  • Set up cost monitoring in Snowflake's cost insights + a weekly Slack summary
  • Wrote a "before you spin up a warehouse" playbook for their data team

— The handoff

Two-hour walkthrough with their data engineer. Runbook for cost monitoring. We checked in at 30 days (savings holding) and at 90 days (savings holding, no regressions). Engagement closed.

The audit paid for itself in week 6. The build paid for itself in month 4. We're still on the savings rate 9 months later.
— VP of Engineering · [Redacted Series B SaaS]
Section13
— Build vs Hire vs Us

When Gigaflop is right —
and when it isn't.

Honest read below ↘

Here's the honest comparison. We've turned down engagements when one of the other paths was clearly better. If you're not sure which column you're in, the discovery call is free.

ApproachCostTime-to-valueLock-inBest fit
In-house data team $250K–$500K / yr
(loaded)
6–12 months to first ship None Recurring scale, $10M+ analytics budget
Freelance contractor $80–$200 / hr 4–8 weeks High — single-person dependency One-off projects, no production criticality
Big 4 / SI firm $300K–$1M+ 6–9 months High — vendor methodology Massive enterprise, regulated industries
— Boutique (us) $5K–$50K per engagement 2–4 months None — you own everything Series A–C SaaS, $5M–$50M ARR

— Honest read

  • If you've got the budget and the timeline to hire 2 senior data engineers and wait 9 months for them to ship — do that. It's the right long-term play.
  • If you need a small, well-defined thing done by next quarter — freelance. Lots of excellent people on Toptal, A.Team, Continuum.
  • If you're a regulated enterprise needing 200+ consultants on a 2-year program — Big 4. We'd refer you to Slalom, Credera, or West Monroe.
  • If you're a Series A–C SaaS team with a real problem and a 60–90 day window — that's us.
Section14
— Glossary · data terms

Terms we'll use in the audit.

14 terms · for stakeholders
who don't live in this world ↘

Buyers usually have a head of data who knows these. Some buyers (CTOs, CFOs, founders) don't. This is for them. Skip if it's beneath you.

cdc · change data capture
Replicating only what changed in a source database, instead of re-pulling the whole table. Cheaper, faster, harder to set up correctly.
data contract
A schema-level agreement between a service that emits data and the team that consumes it. Prevents silent breakage.
data lake / lakehouse
Storage of raw, often unstructured data, with compute layered on top. Databricks is the canonical lakehouse.
data mart
A focused subset of a warehouse designed for one team or use case. (Sales mart, Finance mart, etc.)
dbt
The de facto standard for SQL-based data transformation. Lets analytics engineers write modular, tested SQL.
elt vs etl
Order of operations. ETL transforms before loading; ELT loads raw, transforms in the warehouse. ELT has won for analytics.
incremental model
A dbt model that only processes new rows since last run. Cheaper than full refresh, easy to break.
lineage
The map of "this column came from that column came from this raw source." Critical for trust and debugging.
olap / oltp
OLAP = analytical (warehouses, dashboards). OLTP = transactional (your app database). Different shapes, different tools.
reverse-etl
Pushing warehouse data back into operational tools (Salesforce, HubSpot, Marketo). Hightouch and Census are the leaders.
semantic layer
A central definition of what each business metric means, so two dashboards can't disagree on "revenue".
snowflake credit
The unit of compute billing in Snowflake. ~$2–$4 per credit depending on contract.
star schema
The classic warehouse modeling pattern: fact tables in the center, dimension tables around them.
warehouse
The compute + storage system that holds your analytics data. Snowflake, BigQuery, Redshift, Databricks.
Section15
— Common questions · data-specific

Things data buyers usually ask.

10 replies ↘
Q.01We already use Fivetran and dbt. Do you replace them or work with them?+
Almost always, work with them. We're stack-agnostic and opinionated, not stack-religious. If you've standardized on Fivetran + dbt + Snowflake + Looker, we ship into that. If your stack is genuinely the wrong shape (rare — usually only true for pre-Series-A teams), the audit will say so.
Q.02How quickly will I see the savings on a D3 cost optimization?+
The first changes ship within 2 weeks of audit completion. Most teams see 50–70% of the projected savings inside the first month, with the rest landing in months 2–3 as warehouse auto-suspend behavior settles into the new pattern.
Q.03Will you sign an NDA, MSA, and DPA?+
Yes to all three. We have templates that have been reviewed by our clients' counsel; we're also fine working off your paper. SOC 2 Type 2 in process (target Q3 2026). Ask for our security overview if you need to brief your security team early.
Q.04Who actually does the data engineering work?+
Two co-founders + a senior engineering bench from DiscoverWebTech (the 14-year parent company). The architects you meet on the discovery call are in the working sessions every week. We don't subcontract to anonymous offshore teams.
Q.05We've already had a Big 4 firm audit our stack. Do we need yours?+
Probably not, if their audit was actionable. The honest test: did they ship a written document that names specific issues with specific dollar values and specific fixes? If yes, hire them to implement. If their audit was a 50-slide deck of generic recommendations, our audit is different in shape and we'll happily show you a sample.
Q.06Do you do data science / ML / AI?+
Data science: not really — we'll refer you to specialists. ML feature engineering and pipelines: yes (often part of D2). AI applications (LLMs, agents, chatbots): yes — see the AI practice page. The line between "data engineering" and "AI engineering" is blurrier than people pretend.
Q.07Can you help us hire instead of engage you?+
Yes — and we have. Sometimes the audit ends with "you don't need us, you need 1 senior data engineer in-house." We'll write the job spec, review candidates if you want, and have referred several finalists into client roles.
Q.08What happens if the engagement runs over?+
We don't bill overruns to you. Fixed-price means fixed-price. Overruns happen — they're rare (about 1 in 12 engagements) and they're our problem. We've eaten 3–4 weeks of internal cost rather than ask for change orders.
Q.09Do you offer a satisfaction guarantee?+
Yes for D1 audits — if the audit document doesn't surface at least 3 specific, dollar-quantified findings, we refund half. We've never had to. For D2/D3/D4 builds, we don't offer a refund (the work ships either way) but we do offer a 30-day post-launch warranty: anything that breaks in the first 30 days, we fix on our dime.
Q.10Do you take equity?+
No. We're a services firm, not a shadow-VC.
Section16
— Start · next step
/services/data · END ↘

Most engagements start
with the audit.

Two weeks. Written deliverable. Either becomes the scope for a build, or it's the only thing we do — your call. About 70% of audits convert to a build engagement.

Book a 30-min discovery call → hello@gigafloptechlab.com
P.S. We'd rather lose a large engagement than push a project that shouldn't ship.