Evaluation of Snowflake as an Enterprise Cloud Data Warehouse (DWH)
Project duration: 3 months
Brief description
The aim of the project is to evaluate Snowflake as a candidate for the replacement of an Enterprise Data Warehouse (EDWH) based on Teradata. In addition to performance analyses compared to the existing Teradata-based DWH, the integration into the customer's system landscape is of particular interest. A near-real-time data management is evaluated according to the change data capture principle, a comprehensive data import from the AWS S3 storage, a data transformation including analytical functions and Python as well as a reporting on detail data level of several billion data records with a frontend tool. The PTA supports the definition of a catalogue of criteria, the provision and import of representative test data and develops scenarios for reporting. A performance analysis is also carried out.
Supplement
The connection of the source systems to the Enterprise DWH takes place primarily according to the Change Data Capture principle with the tool HVR (High Volume Replication). For source systems with large amounts of data and frequent data changes, an efficient method (Burst Mode) is tested, which works in conjunction with the AWS S3 Storage. In addition, an efficient data import from the AWS S3 storage is evaluated directly, in which several files are loaded simultaneously into tables. On the test data, the PTA performs a performance benchmark to compare the reporting response times with the existing DWH. With both horizontal and vertical scaling of Snowflake Warehouse compute performance, response time is measured based on user behavior. In addition to tool-based access with Tableau, multi-user access is simulated using a Snoflake Connector for Python. The PTA prepares the results of the benchmark for a decision basis.