Indexing Open-Source C/C++ Repositories

Master Thesis

Thesis description:

The Automated Benchmark Management (ABM) platform [1, 2], crawls open-source repositories such as GitHub and BitBucket for Java-based projects. It automatically builds and extracts specific information such as the number of classes present in the project, its cyclomatic complexity, or if it refers to specific APIs or libraries. With this work, ABM builds a database of knowledge that can be immediately queried, similar to a search engine for Java-based projects.

In this thesis, you will research how to apply such a methodology to C/C++ projects through the following steps:

1. Adapt the crawlers to query C/C++ projects.

2. Determine which project information is relevant to extract for C/C++.

3. Design a methodology and determine which tools to use to extract this information from C/ C++ projects using static code analysis.

4. Build a prototype that demonstrates your research. If time allows, the prototype can be directly implemented in the ABM code base.

5. Evaluate you rprototype in terms of performance and quality of the extracted information.


    • Good knowledge of Java and C/C++.
    • Prior knowledge of static analysis and Scala is not required, but appreciated.
    • Autonomy and self-organization skills.


    The thesis will be held in English. Informal communication can be in German or French.

     Learning outcomes:

    • Assimilate and apply knowledge from relevant literature.
    • Static analysis of C/C++ code.
    • Contribute in an international, open-source project.
    • Plan implement, test, and document an independent research project.


    Lisa Nguyen (

    Dr. -Ing. Ben Hermann: (


    [1] Lisa Nguyen Quang Do et al. Toward an automated benchmark management system. In SOAP 2016. DOI: