CLIENT CASE STUDY — SPRNV — Transaction & Commission Analytics
Data Engineering · D6 · ETL Pipeline + BigQuery + Power BI
Rev. 2026.05 · Live client

Multi-network affiliate data.
One reliable source of truth.

→ LIVE CLIENT SPRNV — multi-network affiliate transaction & commission reporting Python · BigQuery · Power BI

SPRNV was pulling transaction and commission data from multiple affiliate networks, reconciling it by hand, and producing reports with no single trusted number. We built an automated Python ETL on Google Cloud Functions, loaded everything into BigQuery, and delivered five Power BI dashboards — gross vs net, publisher performance, decline reasons, geo & device, and a finance reconciliation view.

Multi-network APIs unified BigQuery warehouse 5 dashboards Power BI Live client in production
Section01
— The challenge

Multiple affiliate networks. No single trusted number.

Manual exports
→ automated pipeline

"Every affiliate network has its own API, its own schema, and its own notion of what a transaction is. The job is to make them all speak the same language."

→ Problem 01

Data scattered across multiple network APIs

Transaction and commission data lived in separate affiliate networks — each with its own API format, field naming conventions, and export schedule. There was no automated way to pull it all together, so the team exported files manually on an ad-hoc basis.

→ Problem 02

No reliable gross vs net reconciliation

Gross revenue, net revenue, advertiser commissions, and network fees were tracked separately and inconsistently. Finance had no single view of what was earned, what was owed, and what the network took — creating reconciliation delays and disputes.

→ Problem 03

Amended transactions and status changes breaking history

Affiliate transactions move between pending, approved, and declined states — and amounts get amended. Without upsert logic, reprocessing updated records would either duplicate history or silently overwrite the wrong version.

→ Problem 04

Publisher and campaign performance invisible

Publisher IDs, site names, coupon codes, device signals, and geo data were available in the raw API responses but never surfaced in reporting. The team had no reliable view of which publishers were performing, or why decline rates varied.

Section02
— The build

Python ETL on Google Cloud Functions — incremental, upsert-safe, multi-network.

Cloud Functions
Cloud Scheduler
BigQuery
→ Step 01

Fetch from API

Incremental pull from each network API using date windows. Pagination and cursor tokens handled per-network.

→ Step 02

Validate & normalise

Schema enforcement, type casting, timestamp normalisation. Anomaly flags on nulls, negatives, and unexpected statuses.

→ Step 03

Load to staging

Raw payloads land in raw_<network>_transactions with ingestion metadata preserved for auditability.

→ Step 04

Transform to analytics

Standardised stg_transactionsfct_transactions with stable upsert keys and exploded sub-tables.

→ Step 05

Power BI connects

Dashboards query curated BigQuery views directly. Partitioned and clustered for fast filter performance.

  • 01
    → Incremental loading

    Pull only what changed — handle amendments and status flips.

    Data was fetched in rolling date windows using TransactionDate / ProcessDate. Upsert logic was built around stable composite keys:

    • Primary key: network + commissionId (unique per transaction)
    • Fallback: OrderId decomposition where multiple IDs were packed into one field
    • Amended transactions (amended=true) update amounts and commission in-place
    • Status transitions (pending → approved / declined) tracked without duplicating history
    • Multi-currency fields retained in original + USD for finance reconciliation
  • 02
    → Data quality controls

    Validate at ingestion — reject bad rows, log everything.

    The ETL layer enforced quality before any record reached the analytics layer:

    • Deduplication by canonical transaction key before upsert
    • Type casting: numeric strings → numeric, with null-safe defaults
    • Timestamp normalisation to consistent UTC timezone
    • Anomaly flags: negative/zero amounts, missing commissionId / orderId, unexpected status values
    • Per-run logging: records fetched, records loaded, API failures, rejected rows with reasons
  • 03
    → Trigger & deployment

    Scheduled via Cloud Scheduler — HTTP trigger for ad-hoc runs.

    The Cloud Function runs on a defined schedule for routine incremental loads, with an HTTP trigger available for manual or ad-hoc backfills. Service account authentication handles BigQuery writes. The architecture is stateless — adding a new network means adding a new fetch module, nothing else changes.

Section03
— The warehouse

BigQuery modelled for fast analytics — raw → staging → facts.

Partitioned
Clustered
Upsert-safe
→ Table architecture

Three layers — raw, standardised, analytics-ready.

The warehouse separates concerns cleanly. Raw tables preserve the original API payload and ingestion metadata for auditability. Staging normalises field names, types, and timestamps across all networks into a single consistent schema. Fact tables are what Power BI actually reads.

fct_transactions is the core table — one row per transaction or commission event. Two optional drilldown tables handle nested data: fct_transaction_items explodes basket products and line items, and fct_transaction_parts breaks out commission groups from transactionParts.*.

→ Table inventory
  • raw_<network>_transactions — original payload + ingestion timestamp
  • stg_transactions — standardised fields, clean types, all networks merged
  • fct_transactions — one row per transaction · primary analytics table
  • fct_transaction_items — exploded basket/line items
  • fct_transaction_parts — exploded commission groups
→ Performance optimisations
  • Partitioned by TransactionDate / ProcessDate — prunes full table scans
  • Clustered by network, publisherId, commissionStatus, commissionId
  • Clustering aligns with Power BI's most common filter patterns
  • Upsert via MERGE — no duplicates, no history loss on status changes
  • Views expose only curated fields to Power BI — raw tables stay internal
→ Performance design

Partitioned and clustered to match Power BI's query patterns.

Partitioning by date eliminates full table scans on every dashboard refresh. Clustering by the four fields most used in Power BI filters — network, publisherId, commissionStatus, and commissionId — means BigQuery only reads the blocks it needs.

The result is dashboard load times that stay fast as the transaction history grows, without any additional infrastructure cost.

Section04
— The reporting layer

Five Power BI dashboards — executive through to transaction drilldown.

5 dashboards
Finance · Publisher
Geo · Drilldown
→ Dashboard 01

Executive Performance

The single-screen KPI view for leadership — the numbers that matter without the noise.

  • Total transactions and gross vs net revenue
  • Total commissions at a glance
  • Approved vs declined vs pending split
  • Net performance after advertiser and network fees
→ Dashboard 02

Publisher & Site Performance

Which publishers are driving value — and which are costing more than they return.

  • Top publishers by revenue and commission
  • Approval rate by publisher and siteName
  • Coupon and voucher usage impact on margins
  • Region-wise publisher performance breakdown
→ Dashboard 03

Finance & Reconciliation

The dashboard the finance team actually uses — built for month-end close and payout queries.

  • Commission vs network fee vs advertiser cost
  • Amended transactions tracked separately
  • Decline reasons breakdown for dispute resolution
  • Processing lag: TransactionDate → ProcessDate
→ Dashboard 04

Geo & Device

Where performance comes from — and on what device.

  • Consumer country and ship-to country performance
  • Device trends: clickDevice vs transactionDevice
  • New-to-file customer acquisition signals (newToFile)
  • Regional revenue and approval rate heatmap
→ Dashboard 05

Transaction Drilldown Explorer

A fully filterable table for ops and finance — everything in one exportable view.

  • Drill to individual order IDs and commission IDs
  • Publisher and site detail at row level
  • Filter by network, date, status, country, publisher
  • Export to Excel / CSV for offline reconciliation
→ Architecture note

Scalable to new networks.

The pipeline was designed so that adding a new affiliate network requires only a new fetch module. The staging schema, BigQuery model, and Power BI dashboards all inherit the new data automatically — no structural changes needed.

Section05
— The stack

Python on Cloud Functions, BigQuery, Power BI — nothing you don't need.

GCP-native
Zero managed
infrastructure
→ ETL & orchestration

Python + Google Cloud Functions

Stateless serverless functions triggered by Cloud Scheduler. No servers to manage, no idle cost. Each network gets its own fetch module; the core pipeline stays shared.

→ Data warehouse

Google BigQuery

Serverless analytics warehouse. Partitioned by date, clustered by the fields Power BI filters on most. Handles multi-year transaction history without performance degradation.

→ Scheduling & auth

Cloud Scheduler + Service Account

Cron-style scheduling via Cloud Scheduler. Service account handles BigQuery write permissions — no long-lived credentials, no manual key rotation.

→ Reporting

Power BI (DirectQuery / Import)

Five dashboards connecting directly to curated BigQuery views. Finance exports, publisher performance, geo and device breakdowns — all live from one warehouse.

001 / Networks unified
Multi
Multiple affiliate APIs → one BigQuery schema
002 / Dashboards delivered
5
Executive · Publisher · Finance · Geo · Drilldown
003 / Pipeline trigger
Auto
Cloud Scheduler + HTTP — no manual exports ever
004 / Table layers
3
Raw → staging → analytics-ready facts
005 / Manual exports replaced
Zero
Fully automated end-to-end — finance never exports CSVs
006 / New network effort
1 module
Add a network with one new fetch module — nothing else changes
Section07

Need to unify multiple data sources into one reliable reporting layer?

We build automated data pipelines that pull from multiple APIs, clean and standardise the data, load it into a warehouse, and deliver dashboards your finance and ops teams can actually trust — without a single manual export.

Book a 30-min call →
Fixed-price engagements only. We scope it, price it, ship it.
US · UK · EU · APAC · Billing in USD.