Welcome



Welcome to the Computer Vision Group at RWTH Aachen University!

The Computer Vision group has been established at RWTH Aachen University in context with the Cluster of Excellence "UMIC - Ultra High-Speed Mobile Information and Communication" and is associated with the Chair Computer Sciences 8 - Computer Graphics, Computer Vision, and Multimedia. The group focuses on computer vision applications for mobile devices and robotic or automotive platforms. Our main research areas are visual object recognition, tracking, self-localization, 3D reconstruction, and in particular combinations between those topics.

We offer lectures and seminars about computer vision and machine learning.

You can browse through all our publications and the projects we are working on.

We have one paper accepted at the ECCV'18, GMDL Workshop.

Oct. 4, 2018

We won the 1st Large-scale Video Object Segmentation Challenge at ECCV 2018

You can find details on our approach in our short paper.

Sept. 1, 2018

We won the 2018 ECCV PoseTrack Challenge on 3D human pose estimation.

You can find details on our approach in our short paper.

Sept. 1, 2018

We won the DAVIS Challenge on Video Object Segmentation at CVPR 2018

You can find details on our approach in our short paper.

June 18, 2018

We have one paper accepted at the IEEE International Conference on Computer Vision (ICCV) 2017, 3DRMS Workshop.

Sept. 18, 2017

We have two papers accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017.

June 15, 2017

Recent Publications

Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking

Accepted for IEEE Int. Conference on Robotics and Automation (ICRA'18), to appear

The most common paradigm for vision-based multi-object tracking is tracking-by-detection, due to the availability of reliable detectors for several important object categories such as cars and pedestrians. However, future mobile systems will need a capability to cope with rich human-made environments, in which obtaining detectors for every possible object category would be infeasible. In this paper, we propose a model-free multi-object tracking approach that uses a category-agnostic image segmentation method to track objects. We present an efficient segmentation mask-based tracker which associates pixel-precise masks reported by the segmentation. Our approach can utilize semantic information whenever it is available for classifying objects at the track level, while retaining the capability to track generic unknown objects in the absence of such information. We demonstrate experimentally that our approach achieves performance comparable to state-of-the-art tracking-by-detection methods for popular object categories such as cars and pedestrians. Additionally, we show that the proposed method can discover and robustly track a large variety of other objects.

 

Know What Your Neighbors Do: 3D Semantic Segmentation of Point Clouds

IEEE European Conference on Computer Vision (ECCV'18), GMDL Workshop

In this paper, we present a deep learning architecture which addresses the problem of 3D semantic segmentation of unstructured point clouds. Compared to previous work, we introduce grouping techniques which define point neighborhoods in the initial world space and the learned feature space. Neighborhoods are important as they allow to compute local or global point features depending on the spatial extend of the neighborhood. Additionally, we incorporate dedicated loss functions to further structure the learned point feature space: the pairwise distance loss and the centroid loss. We show how to apply these mechanisms to the task of 3D semantic segmentation of point clouds and report state-of-the-art performance on indoor and outdoor datasets.

 

MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

Conference on Computer Vision and Pattern Recognition (CVPR'18)

In this work, we tackle the problem of instance segmentation, the task of simultaneously solving object detection and semantic segmentation. Towards this goal, we present a model, called MaskLab, which produces three outputs: box detection, semantic segmentation, and direction prediction. Building on top of the Faster-RCNN object detector, the predicted boxes provide accurate localization of object instances. Within each region of interest, MaskLab performs foreground/background segmentation by combining semantic and direction prediction. Semantic segmentation assists the model in distinguishing between objects of different semantic classes including background, while the direction prediction, estimating each pixel's direction towards its corresponding center, allows separating instances of the same semantic class. Moreover, we explore the effect of incorporating recent successful methods from both segmentation and detection (i.e. atrous convolution and hypercolumn). Our proposed model is evaluated on the COCO instance segmentation benchmark and shows comparable performance with other state-of-art models.

Disclaimer Home Visual Computing institute RWTH Aachen University