Module Number

INFO-4xxx
Module Title

AI Safety
Lecture Type(s)

Lecture, Tutorial
ECTS 6
Work load
- Contact time
- Self study
Workload:
180 h
Class time:
60 h / 4 SWS
Self study:
120 h
Duration 1 Semester
Frequency Irregular
Language of instruction English
Type of Exam

Project and closed-book exam

Content

This course provides a comprehensive introduction to safety and reliability in modern AI systems, with a focus on large language models and AI agents. Students will explore technical vulnerabilities including adversarial robustness, jailbreaks, prompt injections, and hallucinations, while examining approaches to detect and prevent these failures. The curriculum covers alignment challenges such as emergent misalignment, scalable oversight, and AI control methods for managing increasingly capable systems.

Objectives

Students will gain hands-on experience with interpretability techniques, evaluation methods, and practical tools for watermarking, detecting AI-generated content, and understanding copyright implications in LLMs. By the end of the course, students will understand both the theoretical foundations and practical aspects of building safer AI systems, including methods for predicting AI capabilities.

Allocation of credits / grading
Type of Class
Status
SWS
Credits
Type of Exam
Exam duration
Evaluation
Calculation
of Module (%)
Prerequisite for participation There are no specific prerequisites.
Lecturer / Other wechselnde Dozenten
Literature

Prerequisites: Prior coursework in deep learning, statistical machine learning, or LLMs.

Last offered unknown
Planned for currently not planned
Assigned Study Areas