![]() |
||||||||||||||||||
|
|
||||||||||||||||||
| Electronic Engineering > People > Dr Ioannis Patras | ||||||||||||||||||
|
RESEARCH THEMESThe research of my team and collaborators is in the area of Computer Vision and Pattern recognition and revolves around the following themes.
Robust Visual TrackingThis line of work focuses on methods for robust tracking of objects in image sequences addressing the issues of (partial) occlusions, changes in object's appearance (e..g due to illumination changes) and structure (e.g. deformations), background clutter and tracking multiple interacting targets. We have developed methods for general object tracking where learning needs to be performed on the fly, as well as methods for domain-specific tracking such as facial feature tracking where the appearance, structure and dynamics of the target(s) can be learned offline. Members: Ioannis Patras Coupled Regression and Classification for Robust Visual TrackingResearchers: Ioannis PatrasThis paper addresses the problem of robust template tracking in image sequences. Our work falls within the discriminative framework in which the observations at each frame yield direct probabilistic predictions of the state of the target. Our primary contribution is that we explicitly address the problem that the prediction accuracy for different observations varies, and in some cases can be very low. To this end, we couple the predictor to a probabilistic classifier which, when trained, can determine the probability that a new observation can accurately predict the state of the target (that is, determine the relevance or reliability of the observation in question). In the particle filtering framework, we derive a recursive scheme for maintaining an approximation of the posterior probability of the state in which multiple observations can be used and their predictions moderated by their corresponding relevance. In this way the predictions of the relevant observations are emphasized, while the predictions of the irrelevant observations are suppressed. We apply the algorithm to the problem of 2D template tracking and demonstrate that the proposed scheme outperforms classical methods for discriminative tracking both in the case of motions which are large in magnitude and also for partial occlusions. See here for details. References
Tracking Multiple Interacting TargetsResearchers: Ioannis PatrasThe work focuses on methods for tracking multiple targets whose states (e.g. relative positions) are correlated. It is applied in the problem of tracking facial features in which anatomical constraints are learned from annotated data. The method has been extensively used for tracking facial features that were used in facial expression analysis. It is also applied in the problem of stereo tracking, where stereoscopic constrains are used in order to robustly track facial features and the iris in a stereoscopic image sequence. The latter is used for gaze tracking using a pair of webcameras. References
Analysis of (Human) MotionThe research aims at tracking and recognition of facial expressions, body poses and gestures and human actions in image sequences. The research is driven by applications in Multimodal Human Computer Interaction, Body games and Multimedia Indexing and Retrieval.
Members: Ioannis Patras, Irene Kotsia, Weiwei Guo, Sander Koelstra, Vijay Kumar, Antonis Oikonomopoulos (Imperial College), Ognjen Rudinovic (Imperial College) , Stefanos Vrochidis (partially in ITI-CERTH) Human action recognition and localisation in image sequencesResearchers: Ioannis Patras, Irene Kotsia, Vijay Kumar, Antonis Oikonomopoulos This work aims at developing methods for recognition and localisation of human and animal action categories in image sequences. Once trained, the methods should be able to detect and localise in an unseen image sequence, all the actions that belong to one of the known categories. The methodologies will allow training the models in image sequences in which there is significant background clutter, that is in the presence of multiple objects/actions in the scene and moving cameras. No prior knowledge of the anatomy of the human body is a-priori considered, and therefore the models will be able to identify a large class of action categories, including facial/hand/body actions, animal motion, as well as interaction between humans and objects in their environment (such as drinking a glass of water). References
Facial (expression) analysisResearchers: Ioannis Patras, Irene Kotsia, Sander Koelstra, Ognjen Rudinovic (Imperial College) The main field of interest of this work includes computational intelligence and image processing techniques in order to analyze facial information in images and video. This includes tracking facial features, gaze tracking, recognition of activation of facial muscles (Facial Action Units) and recognition of facial expressions such as the ones associated with the six basic emotions (anger, disgust, fear, happiness, sadness and surprise). Research is conducted not only in controlled environments, but also under challenging conditions, such as occlusion, different lighting/pose and spontaneous facial expressions. References
Human pose estimationResearchers: Ioannis Patras, Weiwei Guo, This work focuses on the recovery of the configuration of body parts from a single 2D image. The research focuses on learning direct mappings from image observations to the parameters that describe the 3D body pose. References
Motion AnalysisResearchers: Ioannis Patras This work focuses on motion estimation and analysis. This includes the estimation of dense motion fields using the optical flow equation and regularisation constraints on patches that result from an initial intensity segmentation, as well as work on the reliability of a block-based estimated motion field. References
Human Sensing for Human-Media InteractionThe research aims at the multimodal analysis of user behaviour when interacting with multimedia content. This includes analysis of both traditional modes of interaction (e.g. mouse and keyboard input) but mainly novel means of interaction such as EEG (encephalograph) signals, facial expressions and gaze patterns. The research is driven by applications in Multimedia Indexing and Retrieval as well as in Multimodal Human Computer Interaction. Members: Ioannis Patras, Sander Koelstra, Stefanos Vrochidis (partially in ITI-CERTH) EEG analysis for implicit taggingResearchers: Sander Koelstra, Ioannis Patras
In this work, we aim to analyze neuro-physiological user reactions to the presentation of multimedia, for indexing and retrieval. An advantage of using the EEG modality is that it can facilitate implicit tagging, that is it ican occur while the use passively watches multimedia content. We first analyze EEG signals in order to validate tags attached to video content. Subjects are shown a video and a tag and we aim to determine whether the shown tag was congruent with the presented video by detecting the occurrence of an N400 event-related potential. Tag validation could be used in conjunction with a vision-based recognition system as a feedback mechanism to improve the classification accuracy for multimedia indexing and retrieval. Independent Component Analysis and repeated measures ANOVA are used for analysis. Our experimental results show a clear occurrence of the N400 and a significant difference in N400 activation between matching and non-matching tags. The dataset we collected is now available, see here for details. References
Interactive video retrieval based on implicit user feedbackResearchers: Stefanos Vrochidis, Ioannis PatrasThis line of research focuses on utilising implicit indicators of user interactions with multimedia content via a user-computer interface. As such we consider the user actions during a video retrieval task including gaze, mouse movements and clicks, key strokes and keyboard inputs. The objectives of this work are:
In this context, an interactive video retrieval engine has been implemented, which is capable of retrieving video in different modalities (i.e. textual, visual and temporal search) as well as capturing user interaction. Video analysis was performed by employing state of the art techniques, while implicit feedback analysis was conducted by introducing new video implicit indicators and subsequently constructing an action graph that describes the user navigation during the search process. To validate the approach, the system was tested with real user experiments and its performance was evaluated with the widely used metrics of precision and recall. As it derives from the evaluation and the results, significant improvement of recall and precision is reported after the exploitation of past user-computer interaction. References
Facial expression recognitionResearchers: Sander Koelstra, Maja Pantic (Imperial College), Ioannis Patras
In this work we propose a dynamic-texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. We introduce a novel approach to modeling the dynamics and the appearance in the face region of an input video based on Non-rigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain that form further the input to a set of AU classifiers. Per AU, a combination of ensemble learners and Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2% for the MHI method and of 94.3% for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener dataset. References
(Semantic) SegmentationThe research aims at the localisation in images and image sequences of instances of objects belonging to certain semantic categories. We model object structure and appearance as well the context at which the appear using graphical probabilistic models and/or classification schemes. We utilize both strongly annotated datasets (i.e. data for which the ground truth segmentation is available), weakly annotated datasets (e.g. the presence but not the location of an object in an image is given) as well as datasets from social sites where ambiguities in the labeling of the training datasets are typical.
Members: Ioannis Patras, Giuseppe Passino, Spiros Nikolopoulos (partially in ITI-CERTH) Patch-based semantic labelling of imagesResearchers: Giuseppe Passino Ioannis Patras , Ebroul Izquierdo
This work studies models capable to analyse the structure of an image in terms of relationships among building parts, or patches. This process is aimed at discriminating the relevant clues that allow to pair specific low-level patches appearances with high-level semantic "concepts". In this context, the reasoning can dramatically benefit from the availability of structural data, i.e., information associated to the copresence and relative location of patches. The main challenge is how to take into account this information avoiding the complexity explosion associated to the intrinsic high dimensionality of the problem. Graphical models can provide a theoretical framework to build a learning paradigm able to efficiently infer relevant clues and to use them to ultimately derive the class of the objects depicted within a collection of images. References
Learning object models from social mediaResearchers: Ioannis Patras, Spiros Nikolopoulos (also in ITI-CERTH) Yiannis Kompatsiaris (ITI-CERTH) This work aims at combining the benefits of supervised and un-supervised learning by allowing the supervised methods to learn from training samples that are found in collaborative tagging environments, after some preprocessing. Specifically, drawing from a large pool of weakly annotated samples our goal is to collect a set of strongly annotated samples suitable for training an object classifier in a supervised manner. We do this by co-relating the most populated tag-word with the most populated visual-word in a set of weakly annotated images. Tag-words correspond to clusters of terms that are provided by social users to describe an image and are grouped based on their semantic relatedness. Visual-words correspond to clusters of image regions that are identified by an automatic segmentation algorithm and are grouped based on the visual similarity between them. The most populated tag-word is used to provide information about the object that the developed classifier is trained to identify, while the most populated visual-word is used to provide the set of strongly annotated samples for training the classifier in a supervised manner. Our method relies on the fact that due to the common background that most users share, the majority of them tend to contribute similar tags when faced with similar type of visual content. Given this assumption it is expected that as the pool of the weakly annotated images grows, the most populated visual-word in both tag and visual information space will converge into the same object. References |
|||||||||||||||||