Startseite > Fachgruppen > Algorithmen und Komplexität > Lehre > DisDaS: Distributed Data Streams

DisDaS: Distributed Data Streams

In our project group "DisDaS: Distributed Data Streams", we plan to lay the foundations for the design and analysis of distributed algorithms that continuously compute functions of streams of data which are observed by a multitude of data sources. The major challenge is to cope with the huge amount of data generated. Typically, the data streams are too big and arrive too fast to be completely stored, or sent to a server through a network, or processed in real time. Thus we have to find ways to extract useful information from the streams using only very limited resources, like memory, communication volume and computation time.

 

The Scenario

The most commonly used model in distributed streaming algorithms is the continuous monitoring model due to Cormode & Muthukrishnan. In this model n nodes observe a (possibly infinite) data stream and are able to communicate to one central coordinator which computes a function based on the union of all data streams. The goal is to minimize the communication between the nodes and the coordinator. Various problems and results were investigated in this model during the past decades, e.g. the countdown or threshold monitoring problem or, in general, functional monitoring problems. In this project group, we want to generalize the model leading to new research questions.

While the continuous monitoring model focuses on the case with a single coordinator, this model connects n data sources to multiple coordinators by a potentially complex topology.

Also, new problems to investigate arise when the roles of nodes and the coordinator change in a natural way: the coordinator is no longer interested in data arriving at the nodes but instead, the nodes (clients) request data from the coordinator who is acting as a server.

Results of the Seminar

Interval Selection in the streaming modelPascal BemmannSeminarSlides
Multi-Dimensional Online TrackingSteffen KnorrSeminarSlides
Frequency Moments - Approximations and Space ComplexityFelix BiermeierSeminarSlides
Estimating Simple Functions on the Union of Data StreamsTill KnollmannSeminarSlides
On Graph Problems in a Semi-Streaming ModelArne KemperSeminarSlides
The Count-Min Sketch and its ApplicationsJannik SundermeierSeminarSlides
Palindrome Recognition in The Streaming ModelJan BürmannSeminarSlides
Randomized Algorithms for Tracking Distributed Count, Frequencies, and RanksNils KotheSeminarSlides

Seminar Documentation

Final Documentation