Master-Vorlesung: Designing code analyses for large-scale software systems (DECA 1) 2024/25
Course number and language
L.079.05781
The teaching language will be English. Questions in German will be permitted.
Registering and communicating
To attend the course, you have to register in the PAUL system as a participant.
To ask questions, please use the discussion forum in PANDA, so that others can benefit from the answers as well.
If needed, we will also send updates through PANDA circulars.
Schedule
This schedule gives a rough indication of which topics will be covered when.
The following course schedules are non-binding and may change at any time and without prior announcement.
Lecture schedule
Date | Lecture topic | Lecturer |
11.10. | Introduction and course outline | Bodden |
18.10. | Type systems and flow-insensitive, constraint-based analysis | Palaniappan Muthuraman |
25.10. | Lattices and fixed points | Bodden |
08.11. | Intra-procedural flow-sensitive static code analysis | Bodden |
15.11. | Interval analysis, widening and narrowing | Bodden |
22.11. | Call-graph construction | Bodden |
29.11. | Pointer Analysis | Bodden |
06.12. | Inter-procedural program analysis | Bodden |
13.12. | Call-strings approach to context-sensitive analysis | Bodden |
20.12. | Functional approach to context-sensitive analysis | Bodden |
10.01. | Distributive analyses using IFDS | Bodden |
17.01. | Distributive analyses using IDE | Bodden |
24.01. | VASCO | Bodden |
31.01. | Recap & guest lecture regarding code analysis in practice | Bodden |
Exercise / Lab schedule
Date | Lecture topic | Teaching Assistants |
14.10. | Introduction to exercises and labs | Palaniappan |
21.10. | Jimple and control-flow graph | Palaniappan, Michael, Yong |
04.11. | Lattices and design decisions | Palaniappan |
11.11. | Intra-procedural analysis and monotone framework | Palaniappan |
18.11. | Constant Analysis | Yong |
25.11. | Call-graph construction | Yong |
02.12 | Points-to analysis | Yong |
09.12. | Field & access paths | Yong |
16.12. | Call-strings approach to context-sensitive analysis, Precision & scalability | Palaniappan |
13.01. | Distributive analyses using IFDS | Palaniappan |
20.01. | Distributive analyses using IDE | Yong |
27.01. | VASCO | Yong |
Lab assignments
Lab | Hand out date | Due date |
L0 Introduction to the labs | 11.11. | - |
L1 Intra-procedural typestate analysis | 25.11. | 09.12. |
L2 Call-graph algorithms and pointer analysis | 09.12. | 30.12. |
L3 Inter-procedural and field-sensitive taint analysis | 10.01. | 24.01. |
L4 Linear Constant Propagation Using IDE | 17.01. | 31.01. |
Abstract
Static code analysis is used to detect bugs and security breaches, and aids compiler optimization. By searching for suspicious/interesting patterns in a program’s code it allows to extract useful information that can be used in a variety of applications. This course will explain how to design static code analyses that are inter-procedural, i.e., consider the whole program, across procedure boundaries. Designing such analyses is challenging, as they need to handle millions of program statements efficiently and precisely. Example applications are drawn from the area of IT security.
DECA 1 and 2
Since 2020/21, we started offering DECA in two parts, DECA 1 and 2. This winter term, DECA will amount to "DECA 1", covering basic, introductory topics. At the end of this course you will be able to design and implement full-fledged static program analyses, albeit using rather state-of-the art concepts.
If you also attend "DECA 2" in the summer term, you will be taught quite novel, advanced concepts out of cutting-edge research such as weighted pushdown systems and demand-driven program analyses. After having attended this course, you will be one of a select few worldwide that master these exciting technologies. If you wish to attend "DECA 2", we strongly recommend to complete "DECA 1" first.
Course structure
Each week, two hours will be dedicated to the lecture, and two hours will be dedicated to concrete exercise classes and graded programming labs.
In the exercise sessions, you will be able to apply the notions seen during the lecture into more concrete topics, preparing you to present your knowledge (with respect to the final exam).
The goal of the programming labs is to allow you to create concrete program analyses, solving open problems on the current topic, and deepening your knowledge and understanding of the notions seen in the lecture and exercise sessions. The lab assignments will mostly be done at home, using the scheduled lab hours to answer questions on the ongoing lab.
If you have questions to the organisation of the course, the topic, the exercises, or the labs, or if you get stuck when solving the exercises or labs, please use the forum in PANDA. We try to answer on a regular basis and as soon as possible.
Evaluation
Graded labs:
- During the semester, you will have to hand in four graded labs.
- Each lab has to be handed in through PANDA on its due date by 08:00 am. The dates can be found here.
- The labs will be done in groups of four.
- Late labs will not be accepted.
- Plagiarism will result on a 0 grade for the lab and will be reported to the department. It can result in severe consequences such as financial fine and expulsion from the university.
Final exam:
At the end of the course, you will have the opportunity to register for the written exam based on your lab grade:
- If you scored below 50%, you cannot register to the exam.
- If you scored 50% or more, you can register to the exam.
- If you scored 70% or more, you will receive a bonus of 0.3 on your final grade.
- If you scored 90% or more, you will receive a bonus of 0.7 on your final grade.
The exam will be in a written format, except for students under the old Prüfungsordnung who will need to register for an oral exam.
Prerequisites
A mature understanding of the Java programming language and object-oriented programming will be helpful.
Syllabus
Topics covered include:
- Intra-procedural data-flow analysis
- Call-graph construction algorithms
- Context-insensitive inter-procedural data-flow analysis
- Context-sensitivity using the call-strings approach
- Value-based contexts
- Context-sensitivity using the functional approach
- Efficiently solving distributed problems in the IFDS and IDE frameworks
- Current challenges in inter-procedural static program analysis
Throughout the course and the exercice sessions, we will discuss applications to software security.
Learning outcomes
After having attended this course, students will have learned…
- how to make educated design decisions when designing automated code analysis for large-scale software systems,
- which algorithms have which properties when using them to implement static code-analyses,
- how to design real–world code analyses for practical problem cases from the area of IT security
- how to interpret important terminology such as context, flow, field and object sensitivity
- how to evaluate and explain the important limitations of static code analysis
- which typical security code analyses exist (OWASP Top 10 etc.) and how they relate to the analysis frameworks explained in the course.
Recommended reading material
We will not be able to provide a script for this course. We will provide powerpoint slides where available, but will develop some concepts also on the blackboard. Students are highly encouraged to take their own copies during their lecture.
A lot of the material is also covered in the following books and papers, however, those publications present the material in a more complex manner than in the lectures, which is why they should mostly be used for deeper personal study.
- Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. POPL '95
- Shmuel Sagiv, Thomas W. Reps, and Susan Horwitz. 1995. Precise Interprocedural Dataflow Analysis with Applications to Constant Propagation. TAPSOFT '95
- Akash Lal, Thomas Reps, and Gogul Balakrishnan. 2005. Extended weighted pushdown systems. CAV 2005
- Nomair A. Naeem, Ondrej Lhoták, and Jonathan Rodriguez. 2010. Practical extensions to the IFDS algorithm. CC 2010
- Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick your contexts well: understanding object-sensitivity. POPL 2011
- Eric Bodden. 2012. Inter-procedural data-flow analysis with IFDS/IDE and Soot. SOAP 2012
- Rohan Padhye, Uday P. Khedker. Interprocedural Data Flow Analysis in Soot using Value Contexts. SOAP 2013
- FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps (Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, Patrick McDaniel), In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 259–269, PLDI ’14, ACM, 2014
- The Secret Sauce in Efficient and Precise Static Analysis: The Beauty of Distributive, Summary-based Static Analyses (and How to Master Them) (Eric Bodden), In ACM SIGPLAN International Workshop on the State Of the Art in Java Program Analysis (SOAP 2018), pages 85–93, ISSTA ’18, ACM, 2018.
- Context-, Flow-, and Field-sensitive Data-flow Analysis Using Synchronized Pushdown Systems (Johannes Späth, Karim Ali, Eric Bodden), In Proceedings of the ACM SIGPLAN Symposium on Principles of Programming Languages, pages 48:1–48:29, 3(POPL), 2019.
- Eric Bodden, Társis Tolêdo, Márcio Ribeiro, Claus Brabrand, Paulo Borba, and Mira Mezini. 2013. SPLLIFT: statically analyzing software product lines in minutes instead of years. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13). Association for Computing Machinery, New York, NY, USA, 355–364.
- Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and Mira Mezini. 2011. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In Proceedings of the 33rd International Conference on Software Engineering (ICSE '11). Association for Computing Machinery, New York, NY, USA, 241–250.