Generative Models for Artificial Intelligence

Prof. Wen-Hsiao Peng

Organization
Prerequisites
Goals
Contents
Specialties
Literature

Organization

Language: English
Frequency: yearly in the winter semester
SWS: 2 V + 2 Ü
Exam: written (Klausur, 90 minutes)

Prerequisites

This course explores the design principles and applications of prominent generative models, including autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and normalizing flows. These models are built upon foundational deep learning networks such as CNNs, RNNs, Mamba, and Transformers. A basic knowledge of these building networks is highly recommended. The lectures may not cover their design details.

Goals

Generative models have played a fundamental role in recent breakthroughs in Artificial Intelligence (AI). They serve as enabling technologies for large language models (LLMs), vision language models (VLMs), as well as large-scale world models that support physical/embodied AI. Since the resurgence of deep learning, multiple generations of generative models have been developed, each with its own strengths and limitations. This course provides an introduction and overview of these models, emphasizing their design principles, training objectives, and real-world applications. Upon successful completion of this course, students will be able to:

Describe their training procedures, model architectures, and potential issues.
Contrast their differences based on training frameworks, sample generation processes, and capabilities for evaluating data likelihood.
Give examples of their potential applications within their own research fields.
(Optional) Experiment with pre-trained generative models.

Machine and deep learning basics
Autoregressive models
Variational autoencoders
Diffusion models
Generative adversarial networks
Normalizing flow models
Visual world foundation models are image/video compressors: (A) compression for generation and (B) generation for compression

Specialties

Adapting large-scale visual world foundation models (e.g. Cosmos) for image/video compression: https://arxiv.org/abs/2501.03575

Literature