
USE CASE
Real-Time Lakehouse on Confluent Cloud
NorthRiver Bank — Fraud Detection & Customer 360
NorthRiver Bank needed to spot fraud in seconds while giving teams fresh, trustworthy data for analytics and ML. Batch ETL and nightly refreshes left analysts and models a day behind. The bank standardized on Confluent Cloud to stream operational events from core systems, enrich them in flight, and publish analytics-ready tables—without building and maintaining fragile pipelines.
NorthRiver Bank struggled with several pressing data challenges:
Slow fraud detection — Batch ETL meant fraudulent activity could go undetected for hours, leaving the bank exposed to significant risk.
Stale analytics and ML data — Nightly refreshes left customer insights and models a full day behind, reducing their accuracy and effectiveness.
Siloed data products — Different teams built separate pipelines for operations, analytics, and machine learning, creating inconsistencies across use cases.
Fragile and costly pipelines — Hand-coded ETL jobs were error-prone, hard to scale, and expensive to maintain under regulatory pressure.
Compliance and governance gaps — Without standardized schema enforcement and lineage, data quality and auditability were inconsistent.
Proposed Architecture
Sources → Confluent Cloud | Kafka
Core banking apps, mobile/online banking, ATMs, card networks, and security logs stream events into Confluent Cloud.
Kafka Connect (with Debezium CDC) captures database changes; Confluent Schema Registry enforces schemas; Stream Governance provides lineage, quality, and policy controls.
Streaming transforms → Confluent Cloud for Apache Flink
Stateful Flink jobs consume Kafka topics from Confluent Cloud to enrich transactions (device fingerprint, geo/IP, customer risk tier), run anomaly rules/CEP, and compute rolling KPIs.
Results are written back to Confluent Cloud as curated topics (e.g., fraud_alerts, customer_insights) with exactly-once guarantees.
Publish to lakehouse → Confluent Tableflow
Confluent Tableflow continuously materializes curated Kafka topics into open table formats (Apache Iceberg or Delta Lake) in cloud object storage (e.g., Amazon S3).
Tableflow automates file sizing, partitioning, compaction, and schema evolution, and registers tables in the bank’s Glue (or other) catalog so engines can query them immediately.
Analytics & AI
Snowflake (Iceberg), Databricks, Trino/Starburst, and Athena query the same fresh, governed tables for fraud operations, customer 360 dashboards, and model training—no custom ETL.
Why Confluent?
Managed, elastic Kafka with built-in Schema Registry, Kafka Connect, and Stream Governance simplifies operations and compliance.
First-class Flink on Confluent Cloud delivers low-latency enrichment and exactly-once processing without standing up clusters.
Tableflow turns Kafka topics into query-ready Iceberg/Delta tables—zero bespoke batch jobs.
Open & portable: data lands in open formats, queryable by your favorite engines and BI tools.
Outcomes for NorthRiver Bank
Fraud detection latency reduced from hours to seconds (stream scoring + alerting).
One data product for ops, analytics, and ML—stream and historical views stay in sync.
Lower TCO & risk: fewer moving parts, governed schemas, automated table optimization.
Faster delivery: new data products published via Tableflow in minutes, not weeks.