Multi-network affiliate data.
One reliable source of truth.
SPRNV was pulling transaction and commission data from multiple affiliate networks, reconciling it by hand, and producing reports with no single trusted number. We built an automated Python ETL on Google Cloud Functions, loaded everything into BigQuery, and delivered five Power BI dashboards — gross vs net, publisher performance, decline reasons, geo & device, and a finance reconciliation view.
Multiple affiliate networks. No single trusted number.
"Every affiliate network has its own API, its own schema, and its own notion of what a transaction is. The job is to make them all speak the same language."
Data scattered across multiple network APIs
Transaction and commission data lived in separate affiliate networks — each with its own API format, field naming conventions, and export schedule. There was no automated way to pull it all together, so the team exported files manually on an ad-hoc basis.
No reliable gross vs net reconciliation
Gross revenue, net revenue, advertiser commissions, and network fees were tracked separately and inconsistently. Finance had no single view of what was earned, what was owed, and what the network took — creating reconciliation delays and disputes.
Amended transactions and status changes breaking history
Affiliate transactions move between pending, approved, and declined states — and amounts get amended. Without upsert logic, reprocessing updated records would either duplicate history or silently overwrite the wrong version.
Publisher and campaign performance invisible
Publisher IDs, site names, coupon codes, device signals, and geo data were available in the raw API responses but never surfaced in reporting. The team had no reliable view of which publishers were performing, or why decline rates varied.
Python ETL on Google Cloud Functions — incremental, upsert-safe, multi-network.
Fetch from API
Incremental pull from each network API using date windows. Pagination and cursor tokens handled per-network.
Validate & normalise
Schema enforcement, type casting, timestamp normalisation. Anomaly flags on nulls, negatives, and unexpected statuses.
Load to staging
Raw payloads land in raw_<network>_transactions with ingestion metadata preserved for auditability.
Transform to analytics
Standardised stg_transactions → fct_transactions with stable upsert keys and exploded sub-tables.
Power BI connects
Dashboards query curated BigQuery views directly. Partitioned and clustered for fast filter performance.
-
01
→ Incremental loading
Pull only what changed — handle amendments and status flips.
Data was fetched in rolling date windows using
TransactionDate/ProcessDate. Upsert logic was built around stable composite keys:- Primary key: network + commissionId (unique per transaction)
- Fallback: OrderId decomposition where multiple IDs were packed into one field
- Amended transactions (amended=true) update amounts and commission in-place
- Status transitions (pending → approved / declined) tracked without duplicating history
- Multi-currency fields retained in original + USD for finance reconciliation
-
02
→ Data quality controls
Validate at ingestion — reject bad rows, log everything.
The ETL layer enforced quality before any record reached the analytics layer:
- Deduplication by canonical transaction key before upsert
- Type casting: numeric strings → numeric, with null-safe defaults
- Timestamp normalisation to consistent UTC timezone
- Anomaly flags: negative/zero amounts, missing commissionId / orderId, unexpected status values
- Per-run logging: records fetched, records loaded, API failures, rejected rows with reasons
-
03
→ Trigger & deployment
Scheduled via Cloud Scheduler — HTTP trigger for ad-hoc runs.
The Cloud Function runs on a defined schedule for routine incremental loads, with an HTTP trigger available for manual or ad-hoc backfills. Service account authentication handles BigQuery writes. The architecture is stateless — adding a new network means adding a new fetch module, nothing else changes.
BigQuery modelled for fast analytics — raw → staging → facts.
Three layers — raw, standardised, analytics-ready.
The warehouse separates concerns cleanly. Raw tables preserve the original API payload and ingestion metadata for auditability. Staging normalises field names, types, and timestamps across all networks into a single consistent schema. Fact tables are what Power BI actually reads.
fct_transactions is the core table — one row per transaction or commission event. Two optional drilldown tables handle nested data: fct_transaction_items explodes basket products and line items, and fct_transaction_parts breaks out commission groups from transactionParts.*.
- raw_<network>_transactions — original payload + ingestion timestamp
- stg_transactions — standardised fields, clean types, all networks merged
- fct_transactions — one row per transaction · primary analytics table
- fct_transaction_items — exploded basket/line items
- fct_transaction_parts — exploded commission groups
- Partitioned by TransactionDate / ProcessDate — prunes full table scans
- Clustered by network, publisherId, commissionStatus, commissionId
- Clustering aligns with Power BI's most common filter patterns
- Upsert via MERGE — no duplicates, no history loss on status changes
- Views expose only curated fields to Power BI — raw tables stay internal
Partitioned and clustered to match Power BI's query patterns.
Partitioning by date eliminates full table scans on every dashboard refresh. Clustering by the four fields most used in Power BI filters — network, publisherId, commissionStatus, and commissionId — means BigQuery only reads the blocks it needs.
The result is dashboard load times that stay fast as the transaction history grows, without any additional infrastructure cost.
Five Power BI dashboards — executive through to transaction drilldown.
Executive Performance
The single-screen KPI view for leadership — the numbers that matter without the noise.
- Total transactions and gross vs net revenue
- Total commissions at a glance
- Approved vs declined vs pending split
- Net performance after advertiser and network fees
Publisher & Site Performance
Which publishers are driving value — and which are costing more than they return.
- Top publishers by revenue and commission
- Approval rate by publisher and siteName
- Coupon and voucher usage impact on margins
- Region-wise publisher performance breakdown
Finance & Reconciliation
The dashboard the finance team actually uses — built for month-end close and payout queries.
- Commission vs network fee vs advertiser cost
- Amended transactions tracked separately
- Decline reasons breakdown for dispute resolution
- Processing lag: TransactionDate → ProcessDate
Geo & Device
Where performance comes from — and on what device.
- Consumer country and ship-to country performance
- Device trends: clickDevice vs transactionDevice
- New-to-file customer acquisition signals (newToFile)
- Regional revenue and approval rate heatmap
Transaction Drilldown Explorer
A fully filterable table for ops and finance — everything in one exportable view.
- Drill to individual order IDs and commission IDs
- Publisher and site detail at row level
- Filter by network, date, status, country, publisher
- Export to Excel / CSV for offline reconciliation
Scalable to new networks.
The pipeline was designed so that adding a new affiliate network requires only a new fetch module. The staging schema, BigQuery model, and Power BI dashboards all inherit the new data automatically — no structural changes needed.
Python on Cloud Functions, BigQuery, Power BI — nothing you don't need.
Python + Google Cloud Functions
Stateless serverless functions triggered by Cloud Scheduler. No servers to manage, no idle cost. Each network gets its own fetch module; the core pipeline stays shared.
Google BigQuery
Serverless analytics warehouse. Partitioned by date, clustered by the fields Power BI filters on most. Handles multi-year transaction history without performance degradation.
Cloud Scheduler + Service Account
Cron-style scheduling via Cloud Scheduler. Service account handles BigQuery write permissions — no long-lived credentials, no manual key rotation.
Power BI (DirectQuery / Import)
Five dashboards connecting directly to curated BigQuery views. Finance exports, publisher performance, geo and device breakdowns — all live from one warehouse.
Need to unify multiple data sources into one reliable reporting layer?
We build automated data pipelines that pull from multiple APIs, clean and standardise the data, load it into a warehouse, and deliver dashboards your finance and ops teams can actually trust — without a single manual export.
Book a 30-min call →US · UK · EU · APAC · Billing in USD.