![]() |
||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| Electronic Engineering > Research > Multimedia & Vision Group > Research Areas > Human Sensing for Human-Media Interaction | ||||||||||||||||||||||||||||||||||||||||||
|
Human Sensing for Human-Media InteractionAcademic contacts: Prof Ebroul Izquierdo , Dr Ioannis PatrasInvolved people:
Summary: The research aims at the multimodal analysis of user behaviour when interacting with multimedia content. This includes analysis of both traditional modes of interaction (e.g. mouse and keyboard input) but mainly novel means of interaction such as EEG (encephalograph) signals, facial expressions and gaze patterns. The research is driven by applications in Multimedia Indexing and Retrieval as well as in Multimodal Human Computer Interaction. See below for examples of some of the research carried out within this cluster. Sub-topics:
EEG analysis for implicit taggingAcademic contacts: Dr Ioannis PatrasInvolved people:
Figure 2: Difference in erp activation for matching/non-matching tags
Figure 1: A researcher sacrificing himself for SCIENCE! In this work, we aim to analyze neuro-physiological user reactions to the presentation of multimedia, for indexing and retrieval. An advantage of using the EEG modality is that it can facilitate implicit tagging, that is it can occur while the use passively watches multimedia content. We first analyze EEG signals in order to validate tags attached to video content. Subjects are shown a video and a tag and we aim to determine whether the shown tag was congruent with the presented video by detecting the occurrence of an N400 event-related potential. Tag validation could be used in conjunction with a vision-based recognition system as a feedback mechanism to improve the classification accuracy for multimedia indexing and retrieval. Independent Component Analysis and repeated measures ANOVA are used for analysis. Our experimental results show a clear occurrence of the N400 and a significant difference in N400 activation between matching and non-matching tags. The dataset we collected is now available, see here for details. PublicationsConferences
Interactive video retrieval based on implicit user feedbackAcademic contacts: Dr Ioannis PatrasInvolved people:
Figure 3: Screenshot of the video retrieval interface This line of research focuses on utilising implicit indicators of user interactions with multimedia content via a user-computer interface. As such we consider the user actions during a video retrieval task including gaze, mouse movements and clicks, key strokes and keyboard inputs. The objectives of this work are:
In this context, an interactive video retrieval engine has been implemented, which is capable of retrieving video in different modalities (i.e. textual, visual and temporal search) as well as capturing user interaction. Video analysis was performed by employing state of the art techniques, while implicit feedback analysis was conducted by introducing new video implicit indicators and subsequently constructing an action graph that describes the user navigation during the search process. To validate the approach, the system was tested with real user experiments and its performance was evaluated with the widely used metrics of precision and recall. As it derives from the evaluation and the results, significant improvement of recall and precision is reported after the exploitation of past user-computer interaction. PublicationsJournals
Facial expression recognitionAcademic contacts: Dr Ioannis Patras, Dr Maja Pantic (Imperial College)Involved people:
Figure 4: Visualization of detected motion between two frames
Figure 5: Examples of Action Units (AUs) In this work we propose a dynamic-texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. We introduce a novel approach to modeling the dynamics and the appearance in the face region of an input video based on Non-rigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain that form further the input to a set of AU classifiers. Per AU, a combination of ensemble learners and Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2% for the MHI method and of 94.3% for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener dataset. PublicationsConferences
A game for semi-automatic image annotationAcademic contacts: Prof Ebroul IzquierdoInvolved people:
Figure 6: Screenshot of annotation game We introduce a new technique for image annotation in which social aspects of human-based computation are exploited. The proposed approach aims at what millions of single, online and cooperative gamers are keen to do, to enjoy them in a social manner. This approach focuses on social aspects of gaming and use of humans in a widely distributed fashion through a process of human-based computation. It aims at motivating people towards image tagging while entertaining themselves. This approach deviates from the conventional "content-based image retrieval (CBIR)" paradigm, favoured by the research community to tackle the problems related to semantic annotation and tagging of multimedia content. The main objective of this project is to present an interactive framework that is capable of annotating images by taking into account of human-based computation. The approach aims on tackling the challenging image annotation task by harvesting human intelligence towards manual annotation. The task is to combine the user perceptions and game objectives to achieve a high success in image labelling and to further reduce the cost of manual annotation. PublicationsConferences
Implicit image annotation using gaze trackingAcademic contacts: Prof Ebroul IzquierdoInvolved people:
This research tries to exploit the human brain analyzing abilities and combine it with the processing strength of machines to minimize the semantic gap between human mind and machines. The research is performed by monitoring the users while they are exploring the visual content of a database. Implicit image annotation and implicit video annotation are the two stages predicted for this research. The monitoring devices include an eye-tracker system which finds the user's gaze point coordinates on the screen and the spent time on that coordinate. For annotating the images, it is presumed that the user has a concept in mind which he/she is either told or induced about it. Based on the user's line of sight patterns the system tags the relative images with the concept that is in user's mind. Next, for annotating the videos, a face recognition system will be added to the eye tracker. The both devices will feed a system that performs basic emotion recognition. Based on users monitored emotions relative parts of video will be annotated. Finally, because there are degrees of uncertainty in the results, fuzzy logic is the chosen analysis environment for conducting the research. |
|||||||||||||||||||||||||||||||||||||||||
| © Queen Mary, University of London 2008 | ||||||||||||||||||||||||||||||||||||||||||
| Electronic Engineering, Queen Mary University of London, Mile End Road, London E1 4NS, UK Tel: +44 (0)20 7882 5346, Fax: +44 (0)20 7882 7997 | ||||||||||||||||||||||||||||||||||||||||||