Semantic Scene Analysis

TNT members involved in this project:

Scene understanding is a challenging topic in computer vision, robots and artificial intelligence. Given one or more images, we want to infer what type of scene is shown in the image, what objects are visible, and physical or contextual relations between the observed objects. This information is important in many applications, such as robot navigation, image search, or surveillance applications.

Relations between objects can be given by physical information, such as "in front of " or "above". More abstractly, however, humans usually consider implicit relations between objects: For instance, both a table and the chairs around the table are "above" the floor. A human observer, on the other hand, would rather consider them to be a single group of objects. In other words, table and chairs define a relation which is more than just "in front of "or "next to". This type of implicitly defined additional information is what we consider as semantic or contextual information.

Approach

We estimate semantic information defined between objects in the scene, and construct a so-called scene graph. Scene graphs neatly represent all the objects within a scene, and allow to analyze the content of an image, or to even compare two images semantically, i.e. with respect to their contents and the relations between their objects.

Figure 1: Example of an observed scene (left) and the scene graph constructed from it (right).

If you are looking for an interesting topic for you bachelor or master thesis, please contact Wentong Liao or Hanno Ackermann.

Master / Bachelor Theses

If you are looking for a topic for your Master or Bachelor thesis, and you are interested in analyzing and modelling abstract problems, please do not hesitate to contact Wentong Liao or Hanno Ackermann. You are required to have good programming skills (MatLab, Python, Java or C++) and you need a good understanding of, for instance, linear algebra or statistics.

GUI-Tool for generating ground truth scene graphs and visualisation

We provide a GUI implemented in Matlab for generating ground truth scene graphs and visualising the generated graphs.

It contains the manually labeled scene graph data of NYU_V2_dataset. For more details please refer to the readme in the file.

Download Link

Publications

Show recent publications only

Conference Contributions
- Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He
  FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
  International Conference on Learning Representations (ICLR) , 2024
  (arXiv, Webpage) BibTeX
- Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua
  GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
  Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  (arXiv, Webpage) BibTeX
- Yuren Cong, Jinhui Yi, Bodo Rosenhahn, Michael Yang
  SSGVS: Semantic Scene Graph-to-Video Synthesis
  Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023
  (arXiv.org) BibTeX
- Yuren Cong, Wentong Liao, Hanno Ackermann, Michael Yang Yang, Bodo Rosenhahn
  Spatial-Temporal Transformer for Dynamic Scene Graph Generation
  International Conference on Computer Vision (ICCV), July 2021
  (arXiv.org) BibTeX
- Sen He, Wentong Liao, Michael Ying Yang, Yongxin Yang, Yi-Zhe Song, Bodo Rosenhahn, Tao Xiang
  Context-Aware Layout to Image Generation with Enhanced Object Appearance
  IEEE Conference on Computer Vision and Pattern Recognition, June 2021
  (pdf) BibTeX
- Wentong Liao, Cuiling Lan, Michael Ying Yang, Wenjung Zeng, Bodo Rosenhahn
  Target-Tailored Source-Transformation for Scene Graph Generation
  In CVPR Workshop on Multi-Sensor Fusion for Dynamic Scene Understanding, June 2021
  BibTeX
- Cheng Hao, Wentong Liao, Xuejiao Tang, Michael Ying Yang, Monika Sester, Bodo Rosenhahn
  Exploring Dynamic Context for Multi-path Trajectory Prediction
  International Conference on Robotics and Automation , May 2021
  (pdf) BibTeX
- He Sen, Liao Wentong, Hamed Rezazadegan Tavakoli, Michael Ying Yang, Bodo Rosenhahn, Nicolas Pugeault
  Image Captioning through Image Transformer
  Asian Conference on Computer Vision (ACCV), IEEE, Kyoto, November 2020
  (pdf) BibTeX
- Yuren Cong, Hanno Ackermann, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn
  NODIS: Neural Ordinary Differential Scene Understanding
  European Conference on Computer Vision (ECCV), August 2020
  (arXiv.org) BibTeX
- Christoph Reinders, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn
  Object Recognition from very few Training Examples for Enhancing Bicycle Maps
  2018 IEEE Intelligent Vehicles Symposium (IV), June 2018
  (pdfarXiv.org) BibTeX
- Florian Kluger, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn
  Deep Learning for Vanishing Point Detection Using an Inverse Gnomonic Projection
  39th German Conference on Pattern Recognition, Springer Lecture Notes in Computer Science (LNCS), Basel, Switzerland, September 2017
  (pdfDOI) BibTeX
- Wentong Liao, Chun Yang, Michael Ying Yang, Bodo Rosenhahn
  Security Event Recognition for Visual Surveillance
  ISPRS Annals of Photogrammetry, Remote Sensing \& Spatial Information Sciences, Vol. 4, June 2017
  (pdfDOI) BibTeX
Journals
- Yuren Cong, Michael Yang, Bodo Rosenhahn
  RelTR: Relation Transformer for Scene Graph Generation
  IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2023
  (arXiv.org, GitHub, DOI, IEEEexplore) BibTeX
- Cheng Hao, Wentong Liao, Xuejiao Tang, Michael Ying Yang, Monika Sester, Bodo Rosenhahn
  AMENet: Attentive Maps Encoder Network for Trajectory Prediction
  ISPRS Journal of Photogrammetry and Remote Sensing, Elsevier, Vol. 172, pp. 253--266, 2021
  (DOI) BibTeX
- Michael Ying Yang, Wentong Liao, Hanno Ackermann, Bodo Rosenhahn
  On support relations and semantic scene graphs
  ISPRS Journal of Photogrammetry and Remote Sensing, Elsevier, Vol. 131, pp. 15-25, July 2017
  (Link) BibTeX