Wir gra­tu­lie­ren As­hwin Pra­sad Shi­va­r­pat­na Ven­ka­tesh zu sei­ner be­stan­de­nen Pro­mo­ti­ons­prü­fung

 |  Heinz Nixdorf InstitutSecure Software Engineering / Heinz Nixdorf Institut

Ashwin Prasad Shivarpatna Venkatesh promovierte erfolgreich zum Thema “Advances in Python Call-Graph Construction and Type Inference: A Benchmark-Driven Approach” bei Prof. Dr. Eric Bodden. Dazu gratulieren wir herzlich!

Zusammenfassung der Arbeit:

Software analysis is entering a transitional phase. The rapid growth of data science and machine learning has made Python a central language in modern research, especially in computational environments such as Jupyter Notebooks. Yet the flexibility that makes Python attractive also makes it difficult to analyze: notebooks are often unstructured and poorly documented, while Python’s dynamic features limit the precision of traditional static analysis. At the same time, the rise of Large Language Models (LLMs) has introduced new possibilities for program understanding, but also new questions about reliability and scope. This dissertation argues that the future of Python tooling lies not in replacing static analysis with learned models, but in combining their complementary strengths through a hybrid approach.

To build the static foundation for such a future, this dissertation first develops improved analysis techniques for Python notebooks. We introduce a heuristics-based call-graph analysis that models the behavior of machine-learning libraries and leverages type information from external dependencies. Building on this analysis, we present HeaderGen, a tool that automatically transforms undocumented Jupyter Notebooks into navigable, semantically labeled narratives. Evaluations on real-world notebooks show that the underlying analysis achieves high precision and recall for both call-graph construction and notebook-cell classification. A user study further demonstrates that these generated structures significantly improve comprehension and navigation for data-science practitioners.

A hybrid future also requires reliable ways to measure the capabilities and limits of different approaches. To address the fragmented state of evaluation in Python analysis, this dissertation introduces TypeEvalPy, a benchmarking framework for Python type inference with 154 manually curated test cases and verified ground truth. An empirical study of six representative tools reveals substantial variation in performance. While HeaderGen achieves the most balanced overall results, open-source tools such as Pyright perform well in practical ecosystem integration but remain inconsistent across scenarios. At the same time, learning-based and hybrid systems show that probabilistic predictions can effectively complement static reasoning, even if they remain constrained by training distributions and generalization limits.

Finally, this dissertation examines how generative AI fits into the broader landscape of software analysis. To rigorously evaluate LLMs, we extend TypeEvalPy with an auto-generation engine that scales the benchmark to 7,121 test cases and introduce SWARMCG, a multi-language benchmark suite for call-graph analysis. Our study of 24 LLMs reveals a clear divergence: LLMs perform strongly on type inference, but struggle with call-graph construction, where traditional analyzers remain superior. Taken together, these results suggest that the next generation of Python tooling should be hybrid by design: static analysis provides the structural reliability and global reasoning needed for sound program understanding, while learning-based methods contribute flexibility and predictive power where they are most effective.

v.l.n.r. Dr. Arnab Sharma, Jun.-Prof. Dr. Mohamed Aboubakr Mohamed Soliman, Ashwin Prasad Shivarpatna Venkatesh, Prof. Dr. Eric Bodden (nicht auf dem Foto: Prof. Dr. Li Li und Dr. Yaroslav Kharkov)