Module Number INFO-4xxx	Module Title AI Safety	Lecture Type(s) Lecture, Tutorial
ECTS	6
Work load - Contact time - Self study	Workload: 180 h Class time: 60 h / 4 SWS Self study: 120 h
Duration	1 Semester
Frequency	Irregular
Language of instruction	English
Type of Exam	Project and closed-book exam
Content	This course provides a comprehensive introduction to safety and reliability in modern AI systems, with a focus on large language models and AI agents. Students will explore technical vulnerabilities including adversarial robustness, jailbreaks, prompt injections, and hallucinations, while examining approaches to detect and prevent these failures. The curriculum covers alignment challenges such as emergent misalignment, scalable oversight, and AI control methods for managing increasingly capable systems.
Objectives	Students will gain hands-on experience with interpretability techniques, evaluation methods, and practical tools for watermarking, detecting AI-generated content, and understanding copyright implications in LLMs. By the end of the course, students will understand both the theoretical foundations and practical aspects of building safer AI systems, including methods for predicting AI capabilities.
Allocation of credits / grading	Type of Class Status SWS Credits Type of Exam Exam duration Evaluation Calculation of Module (%)
Prerequisite for participation	There are no specific prerequisites.
Lecturer / Other	wechselnde Dozenten
Literature	Prerequisites: Prior coursework in deep learning, statistical machine learning, or LLMs.
Last offered	unknown
Planned for	currently not planned
Assigned Study Areas

Module Number

INFO-4xxx

Module Title

AI Safety

Lecture Type(s)

Lecture, Tutorial

ECTS

Work load
- Contact time
- Self study

Workload:
180 h

Class time:
60 h / 4 SWS

Self study:
120 h

Duration

1 Semester

Frequency

Irregular

Language of instruction

English

Type of Exam

Project and closed-book exam

Content

This course provides a comprehensive introduction to safety and reliability in modern AI systems, with a focus on large language models and AI agents. Students will explore technical vulnerabilities including adversarial robustness, jailbreaks, prompt injections, and hallucinations, while examining approaches to detect and prevent these failures. The curriculum covers alignment challenges such as emergent misalignment, scalable oversight, and AI control methods for managing increasingly capable systems.

Objectives

Students will gain hands-on experience with interpretability techniques, evaluation methods, and practical tools for watermarking, detecting AI-generated content, and understanding copyright implications in LLMs. By the end of the course, students will understand both the theoretical foundations and practical aspects of building safer AI systems, including methods for predicting AI capabilities.

Allocation of credits / grading

Prerequisite for participation

There are no specific prerequisites.

Lecturer / Other

wechselnde Dozenten

Literature

Prerequisites: Prior coursework in deep learning, statistical machine learning, or LLMs.

Last offered

unknown

Planned for

currently not planned

Assigned Study Areas