Active Data Quality Assessment We establish a systematic and comprehensive framework for the (numeric) assessment of data quality for a given dataset and its intended use. Such a framework must cover the various facets that influence data quality, as well as the many types of data quality dimensions. Systems for AI Data Quality in Finance This project aims to enable non-technical users to validate and increase the quality of their data. For that, these users should be able to express data quality rules in natural language. We will design a data driven approach to leverage such rules to assist a domain expert to finetune data quality rules and “stress test” downstream AI models. This project favors a strong data engineering background combined with an interest to engage with European regulation applicable to financial data. Data quality and ML We explore empirically the relationship between data quality dimensions and the performance of widely used machine learning (ML) algorithms covering the tasks of classification, regression, and clustering, with the goal of explaining their performance in terms of data quality. Completed KITQAR An interdisciplinary project in collaboration with VDE, Tübingen and Köln universities. It shed light on the role of AI data from a legal, ethical, informational and practical perspective. KITQAR is funded by Federal Ministry of Labour and Social Affairs from Dec. 2021 to Jul. 2023. Metanome A project in cooperation with QCRI. It provides a fresh view on data profiling by developing and integrating efficient algorithms into a common tool, expanding on the functionality of data profiling, and addressing performance and scalability issues for Big Data. Real estate Rating Rating the value of a real-estate is a complex process relying on local and global properties. The IREBS of the University of Regensburg and HPI's Information Systems and the Algorithm Engineering groups collaborate to automate real-estate valuation by means of data engineering and artificial intelligence. The project is funded by Deutscher Sparkassenverlag from July 2020 to September 2021 Relational Header Discovery Column headers are among the most relevant types of meta-data for relational tables, because they provide meaning and context in which the data is to be interpreted. Unfortunately, in many cases column headers are missing. We introduce a fully automated, multi-phase system that discovers table column headers where headers are missing, meaningless, or not representative for the column values. Single Column Profiler A Metanome algorithm that collects the statistics about each column of the input dataset (*.csv file) like Data type, Min, Max, Standard deviation, Average, Top 10 frequent items and their frequencies, ...