Developing a configurable interface between Kafka and SQL Database
Project duration: 1 year, 3 months
Brief description
The goal of this project is to digitalize the back-office reconciliation process. By consolidating multiple data sources into a single database, the data becomes readily available to the reconciliation tool, streamlining operations. The software provides a configurable data pipeline from the Energy Trading and Risk Management (ETRM) system (Endur) to a standard reconciliation tool (Xceptor). It enables the reconciliation of business data with other systems to ensure data consistency. PTA was responsible for analyzing data and configuration requirements, designing and specifying the data streaming pipeline, and overseeing software development. The project is a custom software built on standardized streaming platforms.
Supplement
This project aims to create a reliable and flexible interface that can stream data from a Kafka topic to an SQL database, according to the user live customizable specifications. The interface allows the user to filter the messages based on certain certain criteria and select which data fields to write to the database. The interface uses Java and Apache Flink as the framework for data processing and transformation. PTA role is to provide business analysis and guidance for the project.
Subject description
The goal of the project is to automate and optimize the reconciliation process. The source data comes from the ETRM (Energy Trading and Risk Management) system Endur and is temporarily stored in an SQL database. The ultimate target system is the reconciliation software Xceptor. The interface processes complex JSON messages from the ETRM system, which contain structured data with hierarchical trees and arrays. It transforms this information into a SQL relational schema and stores it in an SQL database. A key feature is its flexibility, allowing changes to the data fields and the structure of the JSON messages to be managed through configuration adjustments, without needing to rewrite code. The source data is delivered via Confluent Kafka. The interface operates in real-time and uses Apache Flink to ensure high throughput, high parallelism, and high reliability.