Building Real-Time Data Pipelines at Scale
How we helped a Fortune 500 healthcare company migrate from batch processing to real-time streaming — processing 2 billion events per day with sub-second latency.
The Challenge
Our client, a major healthcare provider, was running critical patient data through nightly batch jobs. By the time analytics were available each morning, they were already 12+ hours stale. For a company managing real-time patient monitoring across 200+ facilities, this was unacceptable.
Our Approach
We designed a hybrid streaming architecture using Apache Kafka for event ingestion and Apache Flink for stateful stream processing. The key insight was that not all data needs real-time treatment — we implemented a tiering system:
The Architecture
The system ingests from 14 different source systems including HL7 FHIR feeds, IoT devices, and legacy EMR databases. We built custom Kafka Connect connectors for the legacy systems and used Debezium for CDC on the relational databases.
Flink handles the heavy lifting — windowed aggregations, complex event processing for anomaly detection, and real-time joins across patient records. All state is checkpointed to S3 for exactly-once guarantees.
Results
The real win? Clinicians now see patient trends as they develop, not the next morning.