M.Sc. István Sárándi
Room 127
Phone: +49 241 80 20 769
Fax: +49 241 80 22 731
Office hours: email me


Hi, I'm a PhD student here at the Computer Vision Group of RWTH Aachen University.

My research interests lie in automated visual analysis of humans, especially for applications such as human-robot interaction and collaborative robotics. I am currently focusing on estimating articulated 3D body pose using deep learning methods. My position is primarily funded by a scholarship from the Bosch Research Foundation.

Supervised Students

  • Markus Knoche
  • Yinglun Liu



Kilian Pfeiffer, Alexander Hermans, István Sárándi, Mark Weber, Bastian Leibe

We address the problem of learning a single model for person re-identification, attribute classification, body part segmentation, and pose estimation. With predictions for these tasks we gain a more holistic understanding of persons, which is valuable for many applications. This is a classical multi-task learning problem. However, no dataset exists that these tasks could be jointly learned from. Hence several datasets need to be combined during training, which in other contexts has often led to reduced performance in the past. We extensively evaluate how the different task and datasets influence each other and how different degrees of parameter sharing between the tasks affect performance. Our final model matches or outperforms its single-task counterparts without creating significant computational overhead, rendering it highly interesting for resource-constrained scenarios such as mobile robotics.

» Show BibTeX

author = {Kilian Pfeiffer and Alexander Hermans and Istv\'{a}n S\'{a}r\'{a}ndi and Mark Weber and Bastian Leibe},
title = {Visual Person Understanding through Multi-Task and Multi-Dataset Learning},
journal = {arXiv:1906.03019},
year = {2019}

István Sárándi, Timm Linder, Kai Oliver Arras, Bastian Leibe
IEEE/RSJ Int. Conference on Intelligent Robots and Systems (IROS'18) Workshops

Occlusion is commonplace in realistic human-robot shared environments, yet its effects are not considered in standard 3D human pose estimation benchmarks. This leaves the question open: how robust are state-of-the-art 3D pose estimation methods against partial occlusions? We study several types of synthetic occlusions over the Human3.6M dataset and find a method with state-of-the-art benchmark performance to be sensitive even to low amounts of occlusion. Addressing this issue is key to progress in applications such as collaborative and service robotics. We take a first step in this direction by improving occlusion-robustness through training data augmentation with synthetic occlusions. This also turns out to be an effective regularizer that is beneficial even for non-occluded test cases.

» Show BibTeX

title={How Robust is 3D Human Pose Estimation to Occlusion?},
author={S{\'a}r{\'a}ndi, Istv{\'a}n and Linder, Timm and Arras, Kai O and Leibe, Bastian},
booktitle={IROS Workshop - Robotic Co-workers 4.0},

István Sárándi, Timm Linder, Kai Oliver Arras, Bastian Leibe
Extended abstract for the ECCV PoseTrack Workshop 2018

In this paper we present our winning entry at the 2018 ECCV PoseTrack Challenge on 3D human pose estimation. Using a fully-convolutional backbone architecture, we obtain volumetric heatmaps per body joint, which we convert to coordinates using soft-argmax. Absolute person center depth is estimated by a 1D heatmap prediction head. The coordinates are back-projected to 3D camera space, where we minimize the L1 loss. Key to our good results is the training data augmentation with randomly placed occluders from the Pascal VOC dataset. In addition to reaching first place in the Challenge, our method also surpasses the state-of-the-art on the full Human3.6M benchmark when considering methods that use no extra pose datasets in training. Code for applying synthetic occlusions is availabe at

» Show BibTeX

author = {S{\'a}r{\'a}ndi, Istv{\'a}n and Linder, Timm and Arras, Kai O and Leibe, Bastian},
title = {Synthetic Occlusion Augmentation with Volumetric Heatmaps for the 2018 {E}{C}{C}{V} {P}ose{T}rack Challenge on 3{D} Human Pose Estimation},
year = {2018}

István Sárándi
Master Thesis

In this thesis we examine the task of estimating how many pedestrians cross a given line in a surveillance video, in the presence of high occlusion and dense crowds. We show that a prior, blob-based pedestrian line counting method fails on our newly annotated private dataset, which is more challenging than those used in the literature.

We propose a new spatiotemporal slice-based method that works with simple low-level features based on optical flow, background subtraction and edge detection and show that it produces good results on the new dataset. Furthermore, presumably due to the very simple and general nature of the features we use, the method also performs well on the popular UCSD vidd dataset without additional hyperparameter tuning, showing the robustness of our approach.

We design new evaluation measures that generalize the precision and recall used in information retrieval and binary classification to continuous, instantaneous pedestrian flow estimations and we argue that they are better suited to this task than currently used measures.

We also consider the relations between pedestrian region counting and line counting by comparing the output of a region counting method with the counts that we derive from line counting. Finally we show a negative result, where a probabilistic method for combining line and region counter outputs does not lead to the hoped result of mutually improved counters.

István Sárándi, Dan Philipp Claßen, Anatoli Astvatsatourov, Oliver Pfaar, Ludger Klimek, Ralph Mösges, Thomas M. Deserno
Methods of Information in Medicine

The conjunctival provocation test (CPT) is a diagnostic procedure for the assessment of allergic diseases. Photographs are taken before and after provocation increasing the redness of the conjunctiva due to hyperemia. We propose and evaluate an automatic image processing pipeline for objective and quantitative CPT. After scale normalization based on intrinsic image features, the conjunctiva region of interest (ROI) is segmented combining thresholding, edge detection and Hough transform. Redness of the ROI is measured from 0 to 1 by the average pixel redness, which is defined by truncated projection in HSV space. In total, 92 images from an observational diagnostic study are processed for evaluation. The database contains images from two visits for assessment of the test-retest reliability (46 images per visit). All images were successfully processed by the algorithm. The relative redness increment correlates between the two visits with Pearson's r=0.672 (p<.001). Linear correlation of the automatic measure is larger than the manual measure (r=0.59). This indicates a higher reproducibility and stability of the automatic method. We presented a robust and effective way to objectify CPT. The algorithm operates on low resolution, is fast and requires no manual input. Quantitative CPT measures can now be established as surrogate endpoint in controlled clinical trials.

» Show BibTeX

title={Quantitative conjunctival provocation test for controlled clinical trials},
author={S{\'a}r{\'a}ndi, I and Cla{\ss}en, DP and Astvatsatourov, A and Pfaar, O and Klimek, L and M{\"o}sges, R and Deserno, TM},
journal={Methods of information in medicine},
publisher={Schattauer GmbH}

Thomas M. Deserno, István Sárándi, Abin Jose, Daniel Haak, Stephan Jonas, Paula Specht, Vincent Brandenburg
SPIE Medical Imaging 2014

Calciphylaxis is a rare disease that has devastating conditions associated with high morbidity and mortality. Calciphylaxis is characterized by systemic medial calcification of the arteries yielding necrotic skin ulcerations. In this paper, we aim at supporting the installation of multi-center registries for calciphylaxis, which includes a photographic documentation of skin necrosis. However, photographs acquired in different centers under different conditions using different equipment and photographers cannot be compared quantitatively. For normalization, we use a simple color pad that is placed into the field of view, segmented from the image, and its color fields are analyzed. In total, 24 colors are printed on that scale. A least-squares approach is used to determine the affine color transform. Furthermore, the card allows scale normalization. We provide a case study for qualitative assessment. In addition, the method is evaluated quantitatively using 10 images of two sets of different captures of the same necrosis. The variability of quantitative measurements based on free hand photography is assessed regarding geometric and color distortions before and after our simple calibration procedure. Using automated image processing, the standard deviation of measurements is significantly reduced. The coefficients of variations yield 5-20% and 2-10% for geometry and color, respectively. Hence, quantitative assessment of calciphylaxis becomes practicable and will impact a better understanding of this rare but fatal disease.

Disclaimer Home Visual Computing institute RWTH Aachen University