Research Field

The 21st century is coined the age of data due to the massive amounts of data collected on a daily basis, in almost all aspects of our lives. Data is everywhere and is generated almost every second of your life: when using the internet, while using a mobile phone, during the acquisition of medical data or even when while shopping. The ever-increasing diversity of such data, ranging from images over data on manifolds to immensely complex high-dimensional data, poses a new challenge. Sophisticated technology and methods are needed for acquisition, analysis, storage, and transmission. Mathematics is of key importance for such methods to have a substantial foundation that can be proven and analyzed, and to allow the development of appropriate approaches. In addition, data-based methods such as deep neural networks have lately shown tremendous success and are often outperforming mathematical methods based on traditional modeling. However, the mathematical basis is still not well understood.

The novel area of mathematics of data science draws from various areas of traditional mathematics such as applied harmonic analysis, functional analysis, numerical linear algebra, optimization, and statistics. It also intersects with the area of machine learning, customarily assigned to computer science. Scientists working in this area are not only developing mathematical theories but – in most cases - are also working together with other disciplines in an interdisciplinary and even transdisciplinary way, based on the application area.

 

Berlin Research Groups

Several branches of the area of mathematics of data science are prominently represented within Berlin mathematics, taking internationally leading positions.

Neural Networks: Multiple groups are working in the area of (Deep) Neural Networks with a focus on applications and also the theoretical mathematical foundations – both offers a variety of exciting challenges. Current topics in this area are for example the expressibility of a network architecture, performance of the learning algorithm, the analysis of the generalization error, the interpretability of the neural network, and the application to either specific areas such as life sciences or to problem settings such as inverse problems. For more information see homepages of: Friz, Noe, Conrad, Schütte, Pokutta, Steidl

Statistics: Of key importance for a variety of data analysis tasks such as hypothesis testing, regression or clustering are statistical methods such as Markov processes, statistical learning theory, or Bayesian statistics. For more information see homepages of: Schütte, Noe, Reiss, Spokoiny

Sparse approximation: A methodological approach to data analysis for dimension reduction or feature selection based on the novel paradigm in mathematical data science that data typically allows a sparse approximation by a suitable basis, often in the form of a variational form with a sparse prior. For more information see homepages of: Conrad, Schütte

Applications in Life Sciences: One particularly intriguing and versatile application area for mathematical data science is life sciences with data ranging from EEG signals to MRI images to dynamical, multimodal, hierarchical data sets from patients. The expertise of Berlin mathematics includes molecular dynamics (Noe, Schütte), analysis of -omics data and personalized medicine  (Conrad, Schütte), and medical imaging sciences (Hintermüller, Steidl).

Applications in Finance: Another application area of data science methods represented in Berlin is finance, which typically requires analysis of time series data.. For more information see homepages of: Friz, Reiss, Spokoiny, Stannat or the IRTG "Stochastic Analysis in Interaction" (coordinator: Bank).

 

Basic Courses

Statistical methods for Data Science

◦ Bayesian statistics

◦ Concentration inequalities, empirical risk minimization

◦ (Maximum likelihood) estimation

◦ Random matrices

◦ Regression and classification, regularization, and (un)supervised learning


Analysis of high-dimensional data

◦ Basics of compressed sensing and (sparse) approximation theory

◦ Complexity measures for data such entropy

◦ Data representations such as structured representations and dictionary learning

◦ Methods for dimension reduction such as Johnson-Lindenstrauss Lemma or PCA

Advanced Courses

Topics for advanced courses reach from “Deep Learning” and “Rough Paths and the Signature Method in Machine Learning” over “Geometric Functional Analysis” and “(High-Dimensional) Convex Geometry” to “Nonlinear Optimization” and “Image Processing”.