Uwiringiyedata, Germain (2025) Real-Time Streaming of Call Detail Records to HDFS: An End-to-End Big Data Pipeline Using Kafka Connect, Apache Airflow, and Apache Spark. International Journal of Innovative Science and Research Technology, 10 (9): 25sep1309. pp. 2064-2071. ISSN 2456-2165
The rapid expansion of telecommunications services produces enormous quan- tities of Call Detail Records (CDRs), requiring real-time ingestion, storage, and analysis to support billing operations and fraud detection systems, and network op- timization. paper presents an end-to-end, containerized big data pipeline Call Detail Records (CDRs) are generated as high-volume event streams that require low-latency ingestion, durable storage, and dependable analytics. This paper presents an end-to- end, containerized big data pipeline that integrates Apache Kafka, Kafka Connect, Hadoop Distributed File System (HDFS), PySpark, and Apache Airflow within a reproducible Docker environment. Unlike conventional batch-oriented approaches, the proposed architecture demonstrates low-latency ingestion, fault-tolerant storage, and scalable processing of high-throughput CDR streams. Experimental results show zero delivery loss at 25 records per second (RPS), balanced partition throughput, and immediate analytical readiness, with roaming traffic analysis and cell-level usage statistics produced in seconds. The work contributes a practical reference model for telecom streaming pipelines, highlighting the advantages of containerized deployment, automated orchestration, and reproducible analytics, and it outlines directions for scaling and production integration.
Altmetric Metrics
Dimensions Matrics
Downloads
Downloads per month over past year
![]() |

