Prof. Wen-Hsiao Peng
This course explores the design principles and applications of prominent generative models, including autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and normalizing flows. These models are built upon foundational deep learning networks such as CNNs, RNNs, Mamba, and Transformers. A basic knowledge of these building networks is highly recommended. The lectures may not cover their design details.
Generative models have played a fundamental role in recent breakthroughs in Artificial Intelligence (AI). They serve as enabling technologies for large language models (LLMs), vision language models (VLMs), as well as large-scale world models that support physical/embodied AI. Since the resurgence of deep learning, multiple generations of generative models have been developed, each with its own strengths and limitations. This course provides an introduction and overview of these models, emphasizing their design principles, training objectives, and real-world applications. Upon successful completion of this course, students will be able to:
Adapting large-scale visual world foundation models (e.g. Cosmos) for image/video compression: https://arxiv.org/abs/2501.03575