As part of the project, a data warehouse (DWH) based on MS SQL Server is designed and built for the customer. An ETL process (extracting, transforming, loading) is implemented for the import and integration of the company data into the DWH. The customer data, which are available in various files and formats each month, are extracted by the ETL-Process from the corresponding files using the Python-Component provided by 'SQL Server Machine Learning Services' and transferred to a STAGING database. After the subsequent transformation, the data is stored in a DWH database. Based on this, the customer creates KPI reports for the detailed evaluation of his business activities.
Supplement
The source data is made available to the customer in individually designed formats. With the Python libraries 'Pandas' and 'Numpy', PTA develops flexible procedures for every source data system in order to identify the data relevant to the customer and to extract it into a STAGING database. The data are standardized and transformed using SQL control procedures, functions, views, etc. so that they can be stored in a suitable form in a star schema – DWH database consisting of fact and dimension tables. The complete configuration and control of the ETL-Process is carried out by a specially developed framework
Subject description
The managers and their internal systems provide the customer with monthly financial and investment figures for their logistics assets (freight wagons, containers, swap bodies) in variously designed source formats. The PTA supports the requirements analysis in order to identify the data to be extracted and to develop calculation algorithms. The aim of the DWH is to establish a uniform, transparent and thus analyzable data platform, which is suitable for the flexible creation of meaningful KPI reports with MS PowerBI. The customer is thus given the opportunity to evaluate his business activities easily and in detail.