ETRA '14: Proceedings of the Symposium on Eye Tracking Research and Applications

Full Citation in the ACM Digital Library

SESSION: Gaze-mediated input

Haptic feedback to gaze events

Eye tracking input often relies on visual and auditory feedback. Haptic feedback offers a previously unused alternative to these established methods. We describe a study to determine the natural time limits for haptic feedback to gazing events. The target is to determine how much time we can use to evaluate the user gazed object and decide if we are going to give the user a haptic notification on that object or not. The results indicate that it is best to get feedback faster than in 250 milliseconds from the start of fixation of an object. Longer delay leads to increase in incorrect associations between objects and the feedback. Delays longer than 500 milliseconds were confusing for the user.

Cross-device gaze-supported point-to-point content transfer

Within a pervasive computing environment, we see content on shared displays that we wish to acquire and use in a specific way i.e., with an application on a personal device, transferring from point-to-point. The eyes as input can indicate intention to interact with a service, providing implicit pointing as a result. In this paper we investigate the use of gaze and manual input for the positioning of gaze-acquired content on personal devices. We evaluate two main techniques, (1) Gaze Positioning, transfer of content using gaze with manual input to confirm actions, (2) Manual Positioning, content is selected with gaze but final positioning is performed by manual input, involving a switch of modalities from gaze to manual input. A first user study compares these techniques applied to direct and indirect manual input configurations, a tablet with touch input and a laptop with mouse input. A second study evaluated our techniques in an application scenario involving distractor targets. Our overall results showed general acceptance and understanding of all conditions, although there were clear individual user preferences dependent on familiarity and preference toward gaze, touch, or mouse input.

The use of gaze to control drones

This paper presents an experimental investigation of gaze-based control modes for unmanned aerial vehicles (UAVs or "drones"). Ten participants performed a simple flying task. We gathered empirical measures, including task completion time, and examined the user experience for difficulty, reliability, and fun. Four control modes were tested, with each mode applying a combination of x-y gaze movement and manual (keyboard) input to control speed (pitch), altitude, rotation (yaw), and drafting (roll). Participants had similar task completion times for all four control modes, but one combination was considered significantly more reliable than the others. We discuss design and performance issues for the gaze-plus-manual split of controls when drones are operated using gaze in conjunction with tablets, near-eye displays (glasses), or monitors.

Look and lean: accurate head-assisted eye pointing

Compared to the mouse, eye pointing is inaccurate. As a consequence, small objects are difficult to point by gaze alone. We suggest using a combination of eye pointing and subtle head movements to achieve accurate hands-free pointing in a conventional desktop computing environment. For tracking the head movements, we exploited information of the eye position in the eye tracker's camera view. We conducted a series of three experiments to study the potential caveats and benefits of using head movements to adjust gaze cursor position. Results showed that head-assisted eye pointing significantly improves the pointing accuracy without a negative impact on the pointing time. In some cases participants were able to point almost 3 times closer to the target's center, compared to the eye pointing alone (7 vs. 19 pixels). We conclude that head assisted eye pointing is a comfortable and potentially very efficient alternative for other assisting methods in the eye pointing, such as zooming.

SESSION: Analysis I: eye tracking data analysis methods

ISeeCube: visual analysis of gaze data for video

We introduce a new design for the visual analysis of eye tracking data recorded from dynamic stimuli such as video. ISeeCube includes multiple coordinated views to support different aspects of various analysis tasks. It combines methods for the spatiotemporal analysis of gaze data recorded from unlabeled videos as well as the possibility to annotate and investigate dynamic Areas of Interest (AOIs). A static overview of the complete data set is provided by a space-time cube visualization that shows gaze points with density-based color mapping and spatiotemporal clustering of the data. A timeline visualization supports the analysis of dynamic AOIs and the viewers' attention on them. AOI-based scanpaths of different viewers can be clustered by their Levenshtein distance, an attention map, or the transitions between AOIs. With the provided visual analytics techniques, the exploration of eye tracking data recorded from several viewers is supported for a wide range of analysis tasks.

Saliency-based Bayesian modeling of dynamic viewing of static scenes

Most analytic approaches for eye-tracking data focus either on identification of fixations and saccades, or on estimating saliency properties. Analyzing both aspects of visual attention simultaneously provides a more comprehensive view of strategies used to process information. This work presents a method that incorporates both aspects in a unified Bayesian model to jointly estimate dynamic properties of scanpaths and a saliency map. Performance of the model is assessed on simulated data and on eye-tracking data from 15 children with autism spectrum disorder and 13 control children. Saliency differences between ASD and TD groups were found for both social and non-social images, but differences in dynamic gaze features were evident in only a subset of social images. These results are consistent with previous region-based analyses as well as previous fixation parameter models, suggesting that the new approach may provide synthesizing and statistical perspectives on eye-tracking analyses.

Creating a new dynamic measure of the useful field of view using gaze-contingent displays

We have developed a measure of transient changes in the useful field of view (UFOV) in simulators using gaze-contingent displays (GCDs). It can be used to evaluate safety-critical tasks such as driving or flight, and in training to increase the UFOV under cognitive load, stress, and fatigue. Unlike the established UFOV© measure, our measure can be used in simulators. Furthermore, previous peripheral detection tasks used in simulators controlled neither the target's retinal eccentricity nor stimulus intensity. Our approach overcomes these limitations by using GCDs to present stimuli producing equal performance across eccentricities under single-task conditions for two dependent measures: blur detection and Gabor orientation discrimination. We then measure attention under dual task conditions by varying cognitive load via an N-back task. Our results showed blur sensitivity varied predictably with retinal eccentricity, but detection of blur did not vary with cognitive load. Conversely, peripheral Gabor orientation discrimination showed a significant cognitive load decrement. While this method is still in development, the results suggest that a GC UFOV method is promising.

On relationships between fixation identification algorithms and fractal box counting methods

Fixation identification algorithms facilitate data comprehension and provide analytical convenience in eye-tracking analysis. However, current fixation algorithms for eye-tracking analysis are heavily dependent on parameter choices, leading to instabilities in results and incompleteness in reporting.

This work examines the nature of human scanning patterns during complex scene viewing. We show that standard implementations of the commonly used distance-dispersion algorithm for fixation identification are functionally equivalent to greedy spatiotemporal tiling. We show that modeling the number of fixations as a function of tiling size leads to a measure of fractal dimensionality through box counting. We apply this technique to examine scale-free gaze behaviors in toddlers and adults looking at images of faces and blocks, as well as large number of adults looking at movies or static images.

The distributional aspects of the number of fixations may suggest a fractal structure to gaze patterns in free scanning and imply that the incompleteness of standard algorithms may be due to the scale-free behaviors of the underlying scanning distributions. We discuss the nature of this hypothesis, its limitations, and offer directions for future work.

SESSION: Calibration & fixation analysis

Towards accurate and robust cross-ratio based gaze trackers through learning from simulation

Cross-ratio (CR) based methods offer many attractive properties for remote gaze estimation using a single camera in an uncalibrated setup by exploiting invariance of a plane projectivity. Unfortunately, due to several simplification assumptions, the performance of CR-based eye gaze trackers decays significantly as the subject moves away from the calibration position. In this paper, we introduce an adaptive homography mapping for achieving gaze prediction with higher accuracy at the calibration position and more robustness under head movements. This is achieved with a learning-based method for compensating both spatially-varying gaze errors and head pose dependent errors simultaneously in a unified framework. The model of adaptive homography is trained offline using simulated data, saving a tremendous amount of time in data collection. We validate the effectiveness of the proposed approach using both simulated and real data from a physical setup. We show that our method compares favorably against other state-of-the-art CR based methods.

Head mounted device for point-of-gaze estimation in three dimensions

This paper presents a fully calibrated extended geometric approach for gaze estimation in three dimensions (3D). The methodology is based on a geometric approach utilising a fully calibrated binocular setup constructed as a head-mounted system. The approach is based on utilisation of two ordinary web-cameras for each eye and 6D magnetic sensors allowing free head movements in 3D. Evaluation of initial experiments indicate comparable results to current state-of-the-art on estimating gaze in 3D. Initial results show an RMS error of 39-50 mm in the depth dimension and even smaller in the horizontal and vertical dimensions regarding fixations. However, even though the workspace is limited, the fact that the system is designed as a head-mounted device, the workspace volume is relatively positioned to the pose of the device. Hence gaze can be estimated in 3D with relatively free head-movements with external reference to a world coordinate system and is therefore offering flexibility and movability within certain constraints.

3D model-based gaze estimation in natural reading: a systematic error correction procedure based on annotated texts

Studying natural reading and its underlying attention processes requires devices that are able to provide precise measurements of gaze without rendering the reading activity unnatural. In this paper we propose an eye tracking system that can be used to conduct analyses of reading behavior in low constrained experimental settings. The system is designed for dual-camera-based head-mounted eye trackers and allows free head movements and note taking. The system is composed of three different modules. First, a 3D model-based gaze estimation method computes the reader's gaze trajectory. Second, a document image retrieval algorithm is used to recognize document pages and extract annotations. Third, a systematic error correction procedure is used to post-calibrate the system parameters and compensate for spatial drifts. The validation results show that the proposed method is capable of extracting reliable gaze data when reading in low constrained experimental conditions.

Robust glint detection through homography normalization

A novel normalization principle for robust glint detection is presented. The method is based on geometric properties of corneal reflections and allows for simple and effective detection of glints even in the presence of several spurious and identically appearing reflections. The method is tested on both simulated and data obtained from web cameras. The proposed method is a possible direction towards making eye trackers more robust to challenging scenarios.

Easy post-hoc spatial recalibration of eye tracking data

The gaze locations reported by eye trackers often contain error resulting from a variety of sources. Such error is of increasing concern to eye tracking researchers, and several techniques have been introduced to clean up the error. These methods, however, either compensate only for error caused by a particular source (such as pupil dilation) or require the error to be somewhat constant across space and time. This paper introduces a method that is applicable to error generated from a variety of sources and that is resilient to the change in error across the display. A study shows that, at least in some cases, although the change in error across the display appears to be random it in fact follows a consistent pattern which can be modeled using quadratic equations. The parameters of these equations can be estimated using linear regression on the error vectors between recorded fixations and possible target locations. The resulting equations can then be used to clean up the error. This regression-based approach is much easier to apply than some of the previously published methods. The method is applied to the data of a visual search experiment, and the results show that the regression-based error correction works very well.

Towards fine-grained fixation analysis: distilling out context dependence

In this paper, we explore the problem of analyzing gaze patterns towards attributing greater meaning to observed fixations. In recent years, there have been a number of efforts that attempt to categorize fixations according to their properties. Given that there are a multitude of factors that may contribute to fixational behavior, including both bottom-up and top-down influences on neural mechanisms for visual representation and saccadic control, efforts to better understand factors that may contribute to any given fixation may play an important role in augmenting raw fixation data. A grand objective of this line of thinking is in explaining the reason for any observed fixation as a combination of various latent factors. In the current work, we do not seek to solve this problem in general, but rather to factor out the role of the holistic structure of a scene as one observable, and quantifiable factor that plays a role in determining fixational behavior. Statistical methods and approximations to achieve this are presented, and supported by experimental results demonstrating the efficacy of the proposed methods.

SESSION: 3D & gaming applications

Comparing estimated gaze depth in virtual and physical environments

We show that the error in 3D gaze depth (vergence) estimated from binocularly-tracked gaze disparity is related to the viewing distance of the screen calibration plane at which 2D gaze is recorded. In a stereoscopic (virtual) environment, this relationship is evident in gaze to target depth error: vergence error behind the screen is greater than in front of the screen and is lowest at the screen depth. In a physical environment, with no accommodation-vergence conflict, the magnitude of vergence error in front of the 2D calibration plane appears reversed, increasing with distance from the viewer.

The effects of fast disparity adjustment in gaze-controlled stereoscopic applications

With the emergence of affordable 3D displays, stereoscopy is becoming a commodity. However, often users report discomfort even after brief exposures to stereo content. One of the main reasons is the conflict between vergence and accommodation that is caused by 3D displays. We investigate dynamic adjustment of stereo parameters in a scene using gaze data in order to reduce discomfort. In a user study, we measured stereo fusion times after abrupt manipulation of disparities using gaze data. We found that gaze-controlled manipulation of disparities can lower fusion times for large disparities. In addition we found that gaze-controlled disparity adjustment should be applied in a personalized manner and ideally performed only at the extremities or outside the comfort zone of subjects. These results provide important insight on the problems associated with fast disparity manipulation and are essential for developing appealing gaze-contingent and gaze-controlled applications.

Gaze-contingent depth of field in realistic scenes: the user experience

Computer-generated objects presented on a display typically have the same focal distance regardless of the monocular and binocular depth cues used to portray a 3D scene. This is because they are presented on a flat screen display that has a fixed physical location. In a stereoscopic 3D display, accommodation (focus) of the eyes should always be at the distance of the screen for clear vision regardless of the depth portrayed; this fixed accommodation conflicts with vergence eye movements that the user must make to fuse stimuli located off the screen. This is known as accommodation-vergence conflict and is detrimental for user experience of stereoscopic virtual environments (VE), as it can cause visual discomfort and diplopia during use of a stereoscopic display. It is believed that, by artificially simulating focal blur and natural accommodation, it is possible to compensate for the vergence-accommodation conflict and alleviate these symptoms. We hypothesized that it is possible to compensate for conflict with a fixed accommodation cue by adding simulated focal blur according to instantaneous fixation.

We examined gaze-contingent depth of field (DOF) when used in stereoscopic and non-stereoscopic 3D displays. We asked our participants to compare different conditions in terms of depth perception, image quality and viewing comfort. As expected, we found that monocular DOF gave a stronger impression of depth than no depth of field, stereoscopic cues were stronger than any kind of monocular cues, but adding depth of field to stereo displays did not enhance depth impressions. The opposite was true for image comfort. People thought that DOF impaired image quality in monocular viewing. We also observed that comfort was affected by DOF and display mode in similar fashion as image quality. However, the magnitude of the effects of DOF simulation on image quality depended on whether people associated image quality with depth or not. These results suggest that studies evaluating DOF effectiveness need to consider the type of task, type of image and questions asked.

Characterizing visual attention during driving and non-driving hazard perception tasks in a simulated environment

Research into driving skill, particularly of hazard perception, often involves studies where participants either view pictures of driving scenarios or use movie viewing paradigms. However oculomotor strategies tend to change between active and passive tasks and attentional limitations are introduced during real driving. Here we present a study using eye tracking methods, to contrast oculomotor behaviour differences across a passive video based hazard perception task and an active hazard perception simulated driving task. The differences presented highlight a requirement to study driving skill under more active conditions, where the participant is engaged with a driving task. Our results suggest that more standard, passive tests, may have limited utility when developing visual models of driving behaviour. The results presented here have implications for driver safety measures and provide further insights into how vision and action interact during natural activity.

Comparing mouse and MAGIC pointing for moving target acquisition

Moving target acquisition is a challenging and manually stressful task if performed using an all-manual, pointer-based interaction technique like mouse interaction, especially if targets are small, move fast, and are visible on screen only for a limited time. The MAGIC pointing interaction approach combines the precision of manual, pointer-based interaction with the speed and little manual stress of eye pointing. In this contribution, a pilot study with twelve participants on moving target acquisition is presented using an abstract experimental task derived from a video analysis scenario. Mouse input, conservative MAGIC pointing and MAGIC button are compared considering acquisition time, error rate, and user satisfaction. Although none of the participants had used MAGIC pointing before, eight participants voted for MAGIC button being their favorite technique; participants performed with only slightly higher mean acquisition time and error rate than with the familiar mouse input. Conservative MAGIC pointing was preferred by three participants; however, mean acquisition time and error rate were significantly worse than with mouse input.

SESSION: Analysis II: finding patterns in eye tracking data

A visual approach for scan path comparison

Several algorithms, approaches, and implementations have been developed to support comparison of scan paths and finding of interesting scan path structures. In this work we contribute a visual approach to support scan path comparison. A key feature of this approach is the combination of a clustering algorithm using Levenshtein distance with the parallel scan path visualization technique. The combination of computational methods with an interactive visualization allows us to use both the power of pattern finding algorithms and the human ability to visually recognize patterns. To use the concept in practice we implemented the approach in a prototype and show its application in two scan path analysis scenarios from automobile usability testing and visualization research.

Eye-movement sequence statistics and hypothesis-testing with classical recurrence analysis

Dynamical systems analysis tools, like Recurrence Plotting (RP), allow for concise mathematical representations of complex systems with relatively simple descriptive metrics. These methods are invariant for phase-space trajectories of a time series from a dynamical system, allowing analyses on simplified data sets which preserve the system model's dynamics. In the past decade, recurrence methods have been applied to eye-tracking, but those analyses avoided Time-Delay Embedding (TDE). Without TDE, we lose the assumption that phase-space trajectories are being preserved in the recurrence plot. Thus, analysis has been typically limited to clustering fixation locations in the image space, instead of clustering data sequences in the phase space. We will show how classical recurrence analysis methods can be extended to allow for multi-modal data visualization and quantification, by presenting an open-source python implementation for analyzing eye movements.

A dynamic graph visualization perspective on eye movement data

During eye tracking studies, vast amounts of spatio-temporal data in the form of eye gaze trajectories are recorded. Finding insights into these time-varying data sets is a challenging task. Visualization techniques such as heat maps or gaze plots help find patterns in the data but highly aggregate the data (heat maps) or are difficult to read due to overplotting (gaze plots). In this paper, we propose transforming eye movement data into a dynamic graph data structure to explore the visualization problem from a new perspective. By aggregating gaze trajectories of participants over time periods or Areas of Interest (AOIs), a fair trade-off between aggregation and details is achieved. We show that existing dynamic graph visualizations can be used to display the transformed data and illustrate the approach by applying it to eye tracking data recorded for investigating the readability of tree diagrams.

Entropy-based statistical analysis of eye movement transitions

The paper introduces a two-step method of quantifying eye movement transitions between Areas of Interests (AOIs). First, individuals' gaze switching patterns, represented by fixated AOI sequences, are modeled as Markov chains. Second, Shannon's entropy coefficient of the fit Markov model is computed to quantify the complexity of individual switching patterns. To determine the overall distribution of attention over AOIs, the entropy coefficient of individuals' stationary distribution of fixations is calculated.

The novelty of the method is that it captures the variability of individual differences in eye movement characteristics, which are then summarized statistically. The method is demonstrated on gaze data collected during free viewing of classical art paintings. Shannon's coefficient derived from individual transition matrices is significantly related to participants' individual differences as well as to their aesthetic experience of art pieces.

SESSION: Visual attention and eye movements

Detection of vigilance performance with pupillometry

Sustained attention (vigilance) is required for many professions such as air traffic controllers, imagery analysts, airport security screeners, and cyber operators. A lapse in attention in any of these environments can have deadly consequences. The purpose of this study was to determine the ability of pupillometry to detect changes in vigilance performance. Each participant performed a 40-minute vigilance task while wearing an eye-tracker on each of four separate days. Pupil diameter, pupil eccentricity, and pupil velocity all changed significantly over time (p<.05) during the task. Significant correlations indicate that all metrics increased as vigilance performance declined except for pupil diameter, which decreased and the pupil became miotic. These results are consistent with other research on attention, fatigue, and arousal levels. Using an eye-tracker to detect changes in pupillometry in an operational environment would allow interventions to be implemented.

Pupil dilations during target-pointing respect Fitts' law

Pupil size is known to correlate with changes of cognitive task workloads, but the pupillary response to requirements of basic goal-directed motor tasks is not yet clear, although pointing with tools is a ubiquitous human task. This work describes a user study to investigate the pupil dilations during aiming in two tele-operation tasks with different target settings, one aiming at targets with different sizes located at constant distance apart, and the other aiming at targets varying in different distances. The task requirements in each task were defined by Fitts' index of difficulty (ID). The purpose of this work is to further explore how the changes in task requirements are reflected by the changes of pupil size, i.e., whether the pupil responds to either target size or target distance, or to both of them. Pupil responses to different task IDs were recorded in each task. The results showed that the pupil responds to the changes of ID, not just to the change of target size. This implies that pupil diameter can be employed as an indicator of task requirement in goal-directed movements, because higher task difficulty evoked higher peak pupil dilation which occurred with longer delay. These findings can be used for detailed understanding of eye-hand coordination mechanisms in interactive systems and contribute to the foundation for developing methods to objectively evaluate interactive task requirements using pupil parameters during goal-directed movements.

The relative contributions of internal motor cues and external semantic cues to anticipatory smooth pursuit

Smooth pursuit eye movements anticipate the future motion of targets when future motion is either signaled by visual cues or inferred from past history. To study the effect of anticipation derived from movement planning, the eye pursued a cursor whose horizontal motion was controlled by the hand via a mouse. The direction of a critical turn was specified by a cue or was freely chosen. Information from planning to move the hand (which itself showed anticipatory effects) elicited anticipatory smooth eye movements, allowing the eye to track self-generated target motion with virtually no lag. Lags were present only when either visual cues or motor cues were removed. The results show that information derived from the planning of movement is as effective as visual cues in generating anticipatory eye movements. Eye movements in dynamic environments will be facilitated by collaborative anticipatory movements of hand and eye. Cues derived from movement planning may be particularly valuable in fast-paced human-computer interactions.

Exploring the influence of audio in directing visual attention during dynamic content

The mechanisms underlying the allocation of visual attention toward dynamic content are still largely unexplored. Due to the number of variables present during dynamic content, it is often difficult to confidently determine what components direct visual attention. In this study, we manipulated the presence of audio in an attempt to explore the contribution of audio in driving visual attention during dynamic content. Participants viewed a reel of non-global commercials while their eye movements were recorded. Participants were either exposed to content containing the original audio track or content in which the audio track was edited out. Dynamic heat maps were created for each ad in order to identify areas of high visual attention between the conditions. Fixation durations and fixation counts for each area of interest were then computed. Analyses showed that the presence of audio has an influence on the allocation of visual attention during dynamic content, most notably in regard to on-screen text. Understanding the influence of audio in directing visual attention may help future researchers control for the extraneous influence of audio in eye-tracking methodologies.

Influence of visual cueing on students' eye movements while solving physics problems

Overlaying visual cues on diagrams and animations can help students attend to relevant areas and facilitate problem solving. In this study we investigated the effects of visual cues on students' eye movements as they solved conceptual physics problems. Students (N=80) enrolled in an introductory physics course individually worked through four sets of problems, each containing a diagram, while their eye movements were recorded. Each diagram contained regions that were alternatively relevant to solving the problem correctly or related to common incorrect responses. Each problem set contained an initial problem, six isomorphic training problems, and a transfer problem. Those in the cued condition saw visual cues overlaid on the training problems. Students provided verbal responses. The cued group more accurately answered the (uncued) transfer problems, and their eye movements showed they more efficiently extracted the necessary information from the relevant area than the uncued group.

SESSION: Mobile eye tracking & applications

EyeSee3D: a low-cost approach for analyzing mobile 3D eye tracking data using computer vision and augmented reality technology

For validly analyzing human visual attention, it is often necessary to proceed from computer-based desktop set-ups to more natural real-world settings. However, the resulting loss of control has to be counterbalanced by increasing participant and/or item count. Together with the effort required to manually annotate the gaze-cursor videos recorded with mobile eye trackers, this renders many studies unfeasible.

We tackle this issue by minimizing the need for manual annotation of mobile gaze data. Our approach combines geometric modelling with inexpensive 3D marker tracking to align virtual proxies with the real-world objects. This allows us to classify fixations on objects of interest automatically while supporting a completely free moving participant.

The paper presents the EyeSee3D method as well as a comparison of an expensive outside-in (external cameras) and a low-cost inside-out (scene camera) tracking of the eye-tracker's position. The EyeSee3D approach is evaluated comparing the results from automatic and manual classification of fixation targets, which raises old problems of annotation validity in a modern context.

An investigation into determining head pose for gaze estimation on unmodified mobile devices

Traditionally, devices which are able to determine a users gaze are large, expensive and often restrictive. We investigate the prospect of using common webcams and mobile devices such as laptops, tablets and phones without modification as an alternative means for obtaining a users gaze. A person's gaze can be fundamentally determined by the pose of the head as well as the orientation of the eyes. This initial work investigates the first of these factors - an estimate of the 3D head pose (and subsequently the positions of the eye centres) relative to a camera device. Specifically, we seek a low cost algorithm that requires only a one-time calibration for an individual user, that can run in real-time on the aforementioned mobile devices with noisy camera data. We use our head tracker to estimate the 4 eye corners of a user over a 10 second video. We present the results at several different frames per second (fps) to analyse the impact on the tracker with lower quality cameras. We show that our algorithm is efficient enough to run at 75fps on a common laptop, but struggles with tracking loss when the fps is lower than 10fps.

EyeTab: model-based gaze estimation on unmodified tablet computers

Despite the widespread use of mobile phones and tablets, hand-held portable devices have only recently been identified as a promising platform for gaze-aware applications. Estimating gaze on portable devices is challenging given their limited computational resources, low quality integrated front-facing RGB cameras, and small screens to which gaze is mapped. In this paper we present EyeTab, a model-based approach for binocular gaze estimation that runs entirely on an unmodified tablet. EyeTab builds on set of established image processing and computer vision algorithms and adapts them for robust and near-realtime gaze estimation. A technical prototype evaluation with eight participants in a normal indoors office setting shows that EyeTab achieves an average gaze estimation accuracy of 6.88° of visual angle at 12 frames per second.

Analysis of gaze behavior while using a multi-viewpoint video viewer

Humans see things from various viewpoints but nobody attempts to see anything from every viewpoint owing to physical limitations and the great effort required. Intelligent interfaces for viewing multi-viewpoint videos may effectively remove these limitations and open up a new visual world to mankind. We have developed a multi-viewpoint video viewer that incorporates target-centered viewpoint switching. The viewer stabilizes an object at the center of the display field, which helps to focus the user's gaze on the target. We conducted a user study to analyze user behavior, especially eye movement, while watching a multi-viewpoint video on the viewer. Statistical analyses of the results indicated that the target-centered viewpoint switching encouraged the users to gaze at the center of the display where the target was located during the viewing. We believe that these are useful findings that pave the way for the design of even more intelligent viewers.

Heatmap rendering from large-scale distributed datasets using cloud computing

Heatmap is one of the most popular visualizations of gaze behavior, however, increasingly voluminous streams of eye-tracking data make processing of such visualization computationally demanding. Because of high requirements on a single processing machine, real-time visualizations from multiple users are unfeasible if rendered locally. We designed a framework that collects data from multiple eye-trackers regardless of their physical location, analyses these streams, and renders heatmaps in real-time. We propose a cloud computing architecture (EyeCloud) consisting of master and slave nodes on a cloud cluster, and a web interface for fast computation and effective aggregation of the large volumes of eye-tracking data. In experimental studies of the feasibility and effectiveness, we built a cloud cluster on a well-known service, implemented the architecture and reported on a comparison between the proposed system and traditional local processing. The results showed efficiency of the EyeCloud when recordings vary in durations. To our knowledge, this is the first solution to implement cloud computing for gaze visualization.

Rendering synthetic ground truth images for eye tracker evaluation

When evaluating eye tracking algorithms, a recurring issue is what metric to use and what data to compare against. User studies are informative when considering the entire eye tracking system, however they are often unsatisfactory for evaluating the gaze estimation algorithm in isolation. This is particularly an issue when evaluating a system's component parts, such as pupil detection, pupil-to-gaze mapping or head pose estimation.

Instead of user studies, eye tracking algorithms can be evaluated using simulated input video. We describe a computer graphics approach to creating realistic synthetic eye images, using a 3D model of the eye and head and a physically correct rendering technique. By using rendering, we have full control over the parameters of the scene such as the gaze vector or camera position, which allows the calculation of ground truth data, while creating a realistic input for a video-based gaze estimator.

Experts vs. novices: applying eye-tracking methodologies in colonoscopy video screening for polyp search

We present in this paper a novel study aiming at identifying the differences in visual search patterns between physicians of diverse levels of expertise during the screening of colonoscopy videos. Physicians were clustered into two groups -experts and novices- according to the number of procedures performed, and fixations were captured by an eye-tracker device during the task of polyp search in different video sequences. These fixations were integrated into heat maps, one for each cluster. The obtained maps were validated over a ground truth consisting of a mask of the polyp, and the comparison between experts and novices was performed by using metrics such as reaction time, dwelling time and energy concentration ratio. Experimental results show a statistically significant difference between experts and novices, and the obtained maps show to be a useful tool for the characterisation of the behaviour of each group.

POSTER SESSION: Poster abstracts

A mixture distribution for visual foraging

Visual foraging is investigated by examining the nature of statistical distributions underlying human search strategies. Eye movements uninfluenced by scene perception or higher level cognition tasks are used to generate a data set which can be analyzed to study 'pure' searches. Eye movements in the form of 'jump' length constituting the entire search process are studied to detect the presence of statistical distributions whose parameters can be estimated. Animal ecology studies have reported the presence of a Lèvy flight/power law model, which explains animal foraging patterns in few species. We consider a Lèvy flight model to explain visual foraging. Results from data analysis, while not ruling out the presence of a power law entirely, point strongly towards the presence of a mixture distribution which faithfully explains visual foraging. This mixture distribution is made up of gamma distributions.

An eye-tracking study assessing the comprehension of c++ and Python source code

A study to assess the effect of programming language on student comprehension of source code is presented, comparing the languages of C++ and Python in two task categories: overview and find bug tasks. Eye gazes are tracked while thirty-eight students complete tasks and answer questions. Results indicate no significant difference in accuracy or time, however there is a significant difference reported on the rate at which students look at buggy lines of code. These results start to provide some direction as to the effect programming language might have in introductory programming classes.

Attentional processes in natural reading: the effect of margin annotations on reading behaviour and comprehension

We present an eye tracking study to investigate how natural reading behavior and reading comprehension are influenced by in-context annotations. In a lab experiment, three groups of participants were asked to read a text and answer comprehension questions: a control group without taking annotations, a second group reading and taking annotations, and a third group reading a peer-annotated version of the same text. A self-made head-mounted eye tracking system was specifically designed for this experiment, in order to study how learners read and quickly re-read annotated paper texts, in low constrained experimental conditions. In the analysis, we measured the phenomenon of annotation-induced overt attention shifts in reading, and found that: (1) the reader's attention shifts toward a margin annotation more often when the annotation lies in the early peripheral vision, and (2) the number of attention shifts, between two different types of information units, is positively related to comprehension performance in quick re-reading. These results can be translated into potential criteria for knowledge assessment systems.

Collaborative eye tracking for image analysis

We present a framework for collaborative image analysis where gaze information is shared across all users. A server gathers and broadcasts fixation data from/to all clients and the clients visualize this information. Several visualization options are provided. The system can run in real-time or gaze information can be recorded and shared the next time an image is accessed. Our framework is scalable to large numbers of clients with different eye tracking devices. To evaluate our system we used it within the context of a spot-the-differences game. Subjects were presented with 10 image pairs each containing 5 differences. They were given one minute to detect the differences in each image. Our study was divided into three sessions. In session 1, subjects completed the task individually, in session 2, pairs of subjects completed the task without gaze sharing, and in session 3, pairs of subjects completed the task with gaze sharing. We measured accuracy, time-to-completion and visual coverage over each image to evaluate the performance of subjects in each session. We found that visualizing shared gaze information by graying out previously scrutinized regions of an image significantly increases the dwell time in the areas of the images that are relevant to the task (i.e. the regions where differences actually occurred). Furthermore, accuracy and time-to-completion also improved over collaboration without gaze sharing though the effects were not significant. Our framework is useful for a wide range of image analysis applications which can benefit from a collaborative approach.

Design issues of remote eye tracking systems with large range of movement

One of the goals of the eye tracking community is to build systems that allow users to move freely. In general, there is a trade-off between the field of view of an eye tracking system and the gaze estimation accuracy. We aim to study how much the field of view of an eye tracking system can be increased, while maintaining acceptable accuracy. In this paper, we investigate all the issues concerning remote eye tracking systems with large range of movement in a simulated environment and we give some guidelines that can facilitate the process of designing an eye tracker. Given a desired range of movement and a working distance, we can calculate the camera focal length and sensor size or given a certain camera, we can determine the user's range of movement. The robustness against large head movement of two gaze estimation methods based on infrared light is analyzed: an interpolation and a geometrical method. We relate the accuracy of the gaze estimation methods with the image resolution around the eye area for a certain feature detector's accuracy and provide possible combinations of pixel size and focal length for different gaze estimation accuracies. Finally, we give the gaze estimation accuracy as a function of a new defined eye error, which is independent of any design parameters.

Development of an untethered, mobile, low-cost head-mounted eye tracker

Head-mounted eye-tracking systems allow us to observe participants' gaze behaviors in largely unconstrained, real-world settings. We have developed novel, untethered, mobile, low-cost, lightweight, easily-assembled head-mounted eye-tracking devices, comprised entirely of off-the-shelf components, including untethered, point-of-view, sports cameras. In total, the parts we have used cost ~$153, and we suggest untested alternative components that reduce the cost of parts to ~$31. Our device can be easily assembled using hobbying skills and techniques. We have developed hardware, software, and methodological techniques to perform point-of-regard estimation, and to temporally align scene and eye videos in the face of variable frame rate, which plagues low-cost, lightweight, untethered cameras. We describe an innovative technique for synchronizing eye and scene videos using synchronized flashing lights. Our hardware, software, and calibration designs will be made publicly available, and we describe them in detail here, to facilitate replication of our system. We also describe novel smooth-pursuit-based calibration methodology, which affords rich sampling of calibration data while compensating for lack of information regarding the extent of visibility on participants' scene recordings. Validation experiments indicate accuracy within 0.752 degrees of visual angle on average.

Estimating point-of-regard using corneal surface image

Recently, the eye-tracker has been developed as a daily-use device. However, when an eye-tracker is used daily, the problem of calibration arises. Even when the calibration for computing the relationship between the scene and eye camera is conducted in advance, the relationship is not maintained in prolonged use. Therefore, we propose a method for conserving the relationship between the scene and eye camera during the execution of an eye-tracking program. The texture information of the corneal surface image is used to estimate the point-of-regard. We confirm the feasibility of the proposed method through preliminary experiments.

EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras

The lack of a common benchmark for the evaluation of the gaze estimation task from RGB and RGB-D data is a serious limitation for distinguishing the advantages and disadvantages of the many proposed algorithms found in the literature. This paper intends to overcome this limitation by introducing a novel database along with a common framework for the training and evaluation of gaze estimation approaches. In particular, we have designed this database to enable the evaluation of the robustness of algorithms with respect to the main challenges associated to this task: i) Head pose variations; ii) Person variation; iii) Changes in ambient and sensing conditions and iv) Types of target: screen or 3D object.

Eye tracking gaze visualiser: eye tracker and experimental software independent visualisation of gaze data

Eye tracking research in disciplines such as cognitive psychology requires specific software packages designed for experiments supporting reaction time measurement, blocking and mixing of conditions and item randomisation. Although recording raw eye movement data is possible, its visualisation is difficult regarding the experimental design. The currently used eye tracking software is often built as an all-in-one program that can only visualise the eye tracking data recorded by itself. Therefore, in this paper a software tool is presented that visualises nearly any recorded eye tracking gaze data on the corresponding video independent of the specific software that runs the experiment. Summarised visualisations over randomised item presentations according to experimental conditions can be created. In addition to basic visualisation functionalities, further features such as simple object detection, repetitive pattern exploration and subset selection of subjects are provided.

Gaze behaviour and linguistic processing of dynamic text in print interpreting

Print interpreting is a form of communication that allows deaf and hard of hearing people to get access to speech. We carried out an eye tracking experiment where twenty participants read print interpreted text presented dynamically on a computer screen. We compared regression landing points on reread words between two dynamic text presentation formats: letter-by-letter and word-by-word. Then we investigated the gaze behaviour from a linguistic point of view in order to discover whether the dynamic presentation has an effect on linguistic factors. In particular, we have examined the parts of speech of the first and the second landing points of regressions. The findings suggest significant difference between the presentation formats. There is also a relationship between the gaze behaviour and the linguistic processing of dynamic text. Being conscious of this lexical hierarchy may help to develop supporting print interpreting tools and consequently may also help print interpreters to improve the presentation of dynamic text to the user.

Improving cross-ratio-based eye tracking techniques by leveraging the binocular fixation constraint

The cross-ratio approach has recently attracted increasing attention in eye-gaze tracking due to its simplicity in setting up a tracking system. Its accuracy, however, is lower than that of the model-based approach, and substantial efforts have been devoted to improving its accuracy. Binocular fixation is essential for humans to have good depth perception, and this paper presents a technique leveraging this constraint. It is used in two ways: First, in estimating jointly the homography matrices for both eyes, and second, in estimating the eye gaze itself. Experimental results with both synthetic and real data show that the proposed approach produces significantly better results than using a single eye and also better than averaging the independent results from the two eyes.

Influence of stimulus and viewing task types on a learning-based visual saliency model

Learning-based approaches using actual human gaze data have been proven to be an efficient way to acquire accurate visual saliency models and attracted much interest in recent years. However, it still remains yet to be answered how different types of stimulus (e.g., fractal images, and natural images with or without human faces) and viewing tasks (e.g., free viewing or a preference rating task) affect learned visual saliency models. In this study, we quantitatively investigate how learned saliency models differ when using datasets collected in different settings (image contextual level and viewing task) and discuss the importance of choosing appropriate experimental settings.

Infusing perceptual expertise and domain knowledge into a human-centered image retrieval system: a prototype application

Traditional content-based image retrieval techniques, which primarily rely on image content at the pixel level, are not effective in accessing images at the semantic level. Defining approaches to incorporate experts' perceptual and conceptual capabilities of image understanding in their domain of expertise into the retrieval processes promises to help bridge this semantic gap. Towards accomplishing this, we design and implement a novel multimodal interactive system for image retrieval. To incorporate human expertise, the system stores expert-derived information extracted from two human sensor modalities that intuitively relate to image search, eye movements and verbal descriptions, both generated by medical experts. Experimental evaluation of the system shows that by transferring experts' perceptual expertise and domain knowledge into image-based computational procedures, our system can take advantage of the different human-centered modalities' respective strengths and improve the retrieval performance over just using image-based features.

Machine-extracted eye gaze features: how well do they correlate to sight-reading abilities of piano players?

Skilled piano players are able to decipher and play a musical piece they had never seen before (a skill known as sight-reading). For a sample of 23 piano players of various abilities we consider the correlation between machine-extracted gaze path features and the overall human rating. We find that correlation values (between machine-extracted gaze features and overall human ratings) are statistically similar to correlation values between human-extracted task-related ratings (e.g., note accuracy, error rate) and overall human ratings. These high correlation values suggest that an eye tracking-enabled computer could help students assess their sight-reading abilities, and could possibly advise students on how to improve. The approach could be extended to any musical instrument. For keyboard players, a MIDI keyboard with the appropriate software to provide information about note accuracy and timing could complement feedback from an eye tracker to enable more detailed analysis and advice.

News stories relevance effects on eye-movements

Relevance is a fundamental concept in information retrieval. We consider relevance from the user's perspective and ask if the degree of relevance can be inferred from eye-tracking data and if it is related to the cognitive effort involved in relevance judgments. To this end we conducted a study, in which participants were asked to find information in screen-long text documents containing news stories. Each participant responded to fourteen trials consisting of an information question followed by three documents each at a different level of relevance (irrelevant, partially relevant, and relevant). The results indicate that relevant documents tended to be continuously read, while irrelevant documents tended to be scanned. In most cases, cognitive effort inferred from eye-tracking data was highest for partially relevant documents and lowest for irrelevant documents.

Predicting an observer's task using multi-fixation pattern analysis

Since Yarbus's seminal work in 1965, vision scientists have argued that people's eye movement patterns differ depending upon their task. This suggests that we may be able to infer a person's task (or mental state) from their eye movements alone. Recently, this was attempted by Greene et al. [2012] in a Yarbus-like replication study; however, they were unable to successfully predict the task given to their observer. We reanalyze their data, and show that by using more powerful algorithms it is possible to predict the observer's task. We also used our algorithms to infer the image being viewed by an observer and their identity. More generally, we show how off-the-shelf algorithms from machine learning can be used to make inferences from an observer's eye movements, using an approach we call Multi-Fixation Pattern Analysis (MFPA).

Real-time hidden gaze point correction

The accuracy of gaze point estimation is one of the main limiting factors in developing applications that utilize gaze input. The existing gaze point correction methods either do not support real-time interaction or imply restrictions on gaze-controlled tasks and object screen locations. We hypothesize that when gaze points can be reliably correlated with object screen locations, it is possible to gather and leverage this information for improving the accuracy of gaze pointing. We propose an algorithm that uses a growing pool of such collected correlations between gaze points and objects for real-time hidden gaze point correction. We tested this algorithm assuming that any point inside of a rectangular object has equal probability to be hit by gaze. We collected real data in a user study to simulate pointing at targets of small (<30px), medium (~50px) and large (>80px) size. The results showed that our algorithm can significantly improve the hit rate especially in pointing at middle-sized targets. The proposed method is real-time, person- and task-independent and is applicable for arbitrary located objects.

Realistic heatmap visualization for interactive analysis of 3D gaze data

In this paper, a novel approach for real-time heatmap generation and visualization of 3D gaze data is presented. By projecting the gaze into the scene and considering occlusions from the observer's view, to our knowledge, for the first time a correct visualization of the actual scene perception in 3D environments is provided. Based on a graphics-centric approach utilizing the graphics pipeline, shaders and several optimization techniques, heatmap rendering is fast enough for an interactive online and offline gaze analysis of thousands of gaze samples.

Recognition of translator expertise using sequences of fixations and keystrokes

Professional human translation is necessary to meet high quality standards in industry and governmental agencies. Translators engage in multiple activities during their task, and there is a need to model their behavior, with the objective to understand and optimize the translation process. In recent years, user interfaces enabled us to record user events such as eye-movements or keystrokes. Although there have been insightful descriptive analysis of the translation process, there are multiple advantages in enabling quantitative inference. We present methods to classify sequences of fixations and keystrokes into activities and model translation sessions with the objective to recognize translator expertise. We show significant error reductions in the task of recognizing certified translators and their years of experience, and analyze the characterizing patterns.

Recurrence quantification analysis reveals eye-movement behavior differences between experts and novices

Understanding and characterizing perceptual expertise is a major bottleneck in developing intelligent systems. In knowledge-rich domains such as dermatology, perceptual expertise influences the diagnostic inferences made based on the visual input. This study uses eye movement data from 12 dermatology experts and 12 undergraduate novices while they inspected 34 dermatological images. This work investigates the differences in global and local temporal fixation patterns between the two groups using recurrence quantification analysis (RQA). The RQA measures reveal significant differences in both global and local temporal patterns between the two groups. Results show that experts tended to refixate previously inspected areas less often than did novices, and their refixations were more widely separated in time. Experts were also less likely to follow extended scan paths repeatedly than were novices. These results suggest the potential value of RQA measures in characterizing perceptual expertise. We also discuss potential use of the RQA method in understanding the interactions between experts' visual and linguistic behavior.

Saccade plots

Visualization by heat maps is a powerful technique for showing frequently visited areas in displayed stimuli. However, by aggregating the spatio-temporal data, heat maps lose the information about the transitions between fixations, i.e., the saccades. In gaze plots, instead, trajectories are shown as overplotted polylines, leading to much visual clutter, which makes those diagrams difficult to read. In this paper, we introduce Saccade Plots as a novel technique that combines the benefits of both approaches: it shows the gaze frequencies as a heat map and the saccades in the form of color-coded triangular matrices that surround the heat map. We illustrate the usefulness of our technique by applying it to a representative example from a previously conducted eye tracking study.

Simulating refraction and reflection of ocular surfaces for algorithm validation in outdoor mobile eye tracking videos

To create input videos for testing pupil detection algorithms for outdoor eye tracking, we develop a simulation of the eye with front-surface reflections of the cornea and the internal refractions of the cornea and refraction at the air/cornea and cornea/aqueous boundaries. The scene and iris are simulated using texture mapping and are alpha-blended to produce the final image of the eye with reflections and refractions. The simulation of refraction is important in order to observe the elliptical shape that the pupil takes on as it goes off axis, and to take into consideration the difference between true pupil position and apparent (entrance) pupil position. Sequences of images are combined to produce input videos for testing the next generation of pupil detection and tracking algorithms, which must sort the pupil out of distracting edges and reflected objects.

Starting to get bored: an outdoor eye tracking study of tourists exploring a city panorama

Predicting the moment when a visual explorer of a place loses interest and starts to get bored is of considerable importance to the design of touristic information services. This paper investigates factors affecting the duration of the visual exploration of a city panorama. We report on an empirical outdoor eye tracking study in the real world with tourists following a free exploration paradigm without a time limit. As main result, the number of areas of interest revisited during a short period was found to be a good predictor for the total exploration duration.

SubsMatch: scanpath similarity in dynamic scenes based on subsequence frequencies

The analysis of visual scanpaths, i.e., series of fixations and saccades, in complex dynamic scenarios is highly challenging and usually performed manually. We propose SubsMatch, a scanpath comparison algorithm for dynamic, interactive scenarios based on the frequency of repeated gaze patterns. Instead of measuring the gaze duration towards a semantic target object (which would be hard to label in dynamic scenes), we examine the frequency of attention shifts and exploratory eye movements. SubsMatch was evaluated on highly dynamic data from a driving experiment to identify differences between scanpaths of subjects who failed a driving test and subjects who passed.

The applicability of probabilistic methods to the online recognition of fixations and saccades in dynamic scenes

In many applications involving scanpath analysis, especially when dynamic scenes are viewed, consecutive fixations and saccades, have to be identified and extracted from raw eye-tracking data in an online fashion. Since probabilistic methods can adapt not only to the individual viewing behavior, but also to changes in the scene, they are best suited for such tasks.

In this paper we analyze the applicability of two types of main-stream probabilistic models to the identification of fixations and saccades in dynamic scenes: (1) Hidden Markov Models and (2) Bayesian Online Mixture Models. We analyze and compare the classification performance of the models on eye-tracking data collected during real-world driving experiments.

TraQuMe: a tool for measuring the gaze tracking quality

Consistent measuring and reporting of gaze data quality is important in research that involves eye trackers. We have developed TraQuMe: a generic system to evaluate the gaze data quality. The quality measurement is fast and the interpretation of the results is aided by graphical output. Numeric data is saved for reporting of aggregate metrics for the whole experiment. We tested TraQuMe in the context of a novel hidden calibration procedure that we developed to aid in experiments where participants should not know that their gaze is being tracked. The quality of tracking data after the hidden calibration procedure was very close to that obtained with the Tobii's T60 trackers built-in 2 point, 5 point and 9 point calibrations.

Verbal gaze instruction matches visual gaze guidance in laparoscopic skills training

Novices were trained to perform a unimanual peg transport task in a laparoscopic training box with an illuminated interior displayed on a monitor. Subjects were divided into two groups; one group was verbally instructed to direct their gaze at distant targets, while the other group had their gaze behaviour implicitly manipulated using distant target illumination. Both groups achieved similar task completion times post-training and developed peripheral vision strategies leading to delayed foveation on targets until the instrument was closer to its destination, although the ability to focus on targets earlier during manual movements as done by an expert surgeon was quickly regained by the verbal instruction group post-training. This suggests that care should be taken when employing visual attention cuing methods such as target highlighting for training eye-hand coordination skills, as simple verbal instruction may be sufficient to help trainees to adopt more expert-like gaze behaviours.

What influences dwell time during source code reading?: analysis of element type and frequency as factors

While knowledge about reading behavior in natural-language text is abundant, little is known about the visual attention distribution when reading source code of computer programs. Yet, this knowledge is important for teaching programming skills as well as designing IDEs and programming languages. We conducted a study in which 15 programmers with various expertise read short source codes and recorded their eye movements. In order to study attention distribution on code elements, we introduced the following procedure: First we (pre)-processed the eye movement data using log-transformation. Taking into account the word lengths, we then analyzed the time spent on different lexical elements. It shows that most attention is oriented towards understanding of identifiers, operators, keywords and literals, relatively little reading time is spent on separators. We further inspected the attention on keywords and provide a description of the gaze on these primary building blocks for any formal language. The analysis indicates that approaches from research on natural-language text reading can be applied to source code as well, however not without review.


A visual approach for scan path comparison

Several algorithms, approaches, and implementations have been developed to support comparison of scan paths and finding of interesting scan path structures. In this work we contribute a visual approach to support scan path comparison. A key feature of this approach is the combination of a clustering algorithm using Levenshtein distance with the parallel scan path visualization technique. The combination of computational methods with an interactive visualization allows us to use both the power of pattern finding algorithms and the human ability to visually recognize patterns. To use the concept in practice we implemented the approach in a prototype and show its application in two scan path analysis scenarios from automobile usability testing and visualization research.

Experts vs. novices: applying eye-tracking methodologies in colonoscopy video screening for polyp search

We present in this paper a novel study aiming at identifying the differences in visual search patterns between physicians of diverse levels of expertise during the screening of colonoscopy videos. Physicians were clustered into two groups -experts and novices- according to the number of procedures performed, and fixations were captured by an eye-tracker device during the task of polyp search in different video sequences. These fixations were integrated into heat maps, one for each cluster. The obtained maps were validated over a ground truth consisting of a mask of the polyp, and the comparison between experts and novices was performed by using metrics such as reaction time, dwelling time and energy concentration ratio. Experimental results show a statistically significant difference between experts and novices, and the obtained maps show to be a useful tool for the characterisation of the behaviour of each group.

ISeeCube: visual analysis of gaze data for video

We introduce a new design for the visual analysis of eye tracking data recorded from dynamic stimuli such as video. ISeeCube includes multiple coordinated views to support different aspects of various analysis tasks. It combines methods for the spatiotemporal analysis of gaze data recorded from unlabeled videos as well as the possibility to annotate and investigate dynamic Areas of Interest (AOIs). A static overview of the complete data set is provided by a space-time cube visualization that shows gaze points with density-based color mapping and spatiotemporal clustering of the data. A timeline visualization supports the analysis of dynamic AOIs and the viewers' attention on them. AOI-based scanpaths of different viewers can be clustered by their Levenshtein distance, an attention map, or the transitions between AOIs. With the provided visual analytics techniques, the exploration of eye tracking data recorded from several viewers is supported for a wide range of analysis tasks.

iShadow: the computational eyeglass system

Continuous, real-time tracking of eye gaze is valuable in a variety of scenarios including hands-free interaction with the physical world, detection of unsafe behaviors, leveraging visual context for advertising, life logging, and others. While eye tracking is commonly used in clinical trials and user studies, it has not bridged the gap to everyday consumer use. The challenge is that a real-time eye tracker is a power-hungry and computation-intensive device which requires continuous sensing of the eye using an imager running at many tens of frames per second, and continuous processing of the image stream using sophisticated gaze estimation algorithms. Our key contribution is the design of an eye tracker that dramatically reduces the sensing and computation needs for eye tracking, thereby achieving orders of magnitude reductions in power consumption and form-factor. The key idea is that eye images are extremely redundant, therefore we can estimate gaze by using a small subset of carefully chosen pixels per frame. We use a sparse pixel-based gaze estimation algorithm that is a multi-layer neural network learned using a state-of-the-art sparsity-inducing regularization function which minimizes the gaze prediction error while simultaneously minimizing the number of pixels used. Our results show that we can operate at roughly 70mW of power, while continuously estimating eye gaze at the rate of 30 Hz with errors of roughly 4 degrees.

Model-based acquisition and analysis of multimodal interactions for improving human-robot interaction

For solving complex tasks cooperatively in close interaction with robots, they need to understand natural human communication. To achieve this, robots could benefit from a deeper understanding of the processes that humans use for successful communication. Such skills can be studied by investigating human face-to-face interactions in complex tasks. In our work the focus lies on shared-space interactions in a path planning task and thus 3D gaze directions and hand movements are of particular interest.

However, the analysis of gaze and gestures is a time-consuming task: Usually, manual annotation of the eye tracker's scene camera video is necessary in a frame-by-frame manner. To tackle this issue, based on the EyeSee3D method, an automatic approach for annotating interactions is presented: A combination of geometric modeling and 3D marker tracking serves to align real world stimuli with virtual proxies. This is done based on the scene camera images of the mobile eye tracker alone. In addition to the EyeSee3D approach, face detection is used to automatically detect fixations on the interlocutor. For the acquisition of the gestures, an optical marker tracking system is integrated and fused in the multimodal representation of the communicative situation.

Pupil detection in the presence of specular reflection

In this work we describe a method of pupil detection for subsequent gaze tracking, when specular reflection is present in the image. Gaze tracking commonly uses the spatial relationship between the pupil and corneal reflection, but is not robust when the user is wearing eyeglasses, since light reflected from the surroundings changes the appearance of the pupil. In this research we propose and evaluate a pupil detection method that can perform robustly even in the presence of such reflection.

Software framework for an ocular biometric system

This document describes the software framework of an ocular biometric system. The framework encompasses several interconnected components that allow an end-user to perform biometric enrollment, verification, and identification with most common eye tracking devices. The framework, written in C#, includes multiple state-of-the-art biometric algorithms and information fusion techniques, and can be easily extended to utilize new biometric techniques and eye tracking devices.

Smartphone eye tracking toolbox: accurate gaze recovery on mobile displays

Human Interaction with mobile devices has recently been estab-lished as application field in eye tracking research. Current technologies for gaze recovery on mobile displays cannot enable fully natural interaction with the mobile device: users are condi-tioned to interact with tightly mounted displays or distracted by markers in their view. We propose a novel approach that cap-tures point-of-regards (PORs) with eye tracking glasses (ETG) and then uses computer vision methodology for the robust local-ization of the smartphone in the head camera video. We present an integrated software package, i.e., the Smartphone Eye Track-ing Toolbox (SMET) that enables accurate gaze recovery on mobile displays with heat mapping of recent attention. We re-port the performance of the computer vision approach and demonstrate it with various natural interaction scenarios using the SMET Toolbox, enable ROI settings on the mobile display and show results from eye movement analysis, such as, ROI dwell time and statistics on eye gaze event (saccades, fixations).

EyeSee3D: a low-cost approach for analyzing mobile 3D eye tracking data using computer vision and augmented reality technology

For validly analyzing human visual attention, it is often necessary to proceed from computer-based desktop set-ups to more natural real-world settings. However, the resulting loss of control has to be counterbalanced by increasing participant and/or item count. Together with the effort required to manually annotate the gaze-cursor videos recorded with mobile eye trackers, this renders many studies unfeasible.

We tackle this issue by minimizing the need for manual annotation of mobile gaze data. Our approach combines geometric modelling with inexpensive 3D marker tracking to align virtual proxies with the real-world objects. This allows us to classify fixations on objects of interest automatically while supporting a completely free moving participant.

The paper presents the EyeSee3D method as well as a comparison of an expensive outside-in (external cameras) and a low-cost inside-out (scene camera) tracking of the eye-tracker's position. The EyeSee3D approach is evaluated comparing the results from automatic and manual classification of fixation targets, which raises old problems of annotation validity in a modern context.

SESSION: Doctoral symposium extended abstracts

A smooth pursuit calibration technique

Many different eye-tracking calibration techniques have been developed [e.g. see Talmi and Liu 1999; Zhu and Ji 2007]. A community standard is a 9-point-sparse calibration that relies on sequential presentation of known scene targets. However, fixating different points has been described as tedious, dull and tiring for the eye [Bulling, Gellersen, Pfeuffer, Turner and Vidal 2013].

Assessment of the improvement of signal recorded in infant EEG by using eye tracking algorithms

Event-related potentials (ERPs) elicited by visual stimuli consist in showing the same stimuli to the subject dozens of times while recording the electrical brain activity and averaging afterwards the EEG signal of the valid trials to get rid of the general brain activity and keep the response generated by the stimuli. ERPs are a common methodology used among cognitive developmental scientists to investigate how infants develop because responses to external events can be observed in ERP without specific behavioral requirements from the infants. However, applying this technique to infants has some disadvantages that are not found in adult participants. These are mainly the limited attention span and the difficulty of getting enough free-artifact trials due to movement artifacts and lack of attention to the stimuli. These limitations are the main reason for the current attrition rates in infant ERP studies, which are expected of between 50%-75% [DeBoer et al., 2007; Stets et al., 2012].

Attentional retraining in depressive disorders

The project is rooted in the concepts of cognitive psychopathology stating that clinical disorders stem from dysfunctional cognitive mechanisms. I hope that the project will help in validation whether attentional bias towards negative stimuli is an underlying cause of depressive disorders.

EOG-based eye gesture input with audio staging

Eye tracking techniques have moved from the laboratory into everyday life; examples include input interfaces for the severely handicapped and object-of-interest selection in the camera finder. They will bring great benefits when they can be used easily in everyday life. However, the current major tracking devices are not accepted widely because they set cameras in front of the user's face which plays several extremely important roles in everyday life. Desktop devices or special personal devices can be used but they impose their own limitations.

Gaze guidance for the visually impaired

Visual perception is perhaps the most important sensory input. During driving, about 90% of the relevant information is related to the visual input [Taylor 1982]. However, the quality of visual perception decreases with age, mainly related to a reduce in the visual acuity or in consequence of diseases affecting the visual system. Amongst the most severe types of visual impairments are visual field defects (areas of reduced perception in the visual field), which occur as a consequence of diseases affecting the brain, e.g., stroke, brain injury, trauma, or diseases affecting the optic nerve, e.g., glaucoma. Due to demographic aging, the number of people with such visual impairments is expected to rise [Kasneci 2013]. Since persons suffering from visual impairments may overlook hazardous objects, they are prohibited from driving. This, however, leads to a decrease in quality of life, mobility, and participation in social life. Several studies have shown that some patients show a safe driving behavior despite their visual impairment by performing effective visual exploration, i.e., adequate eye and head movements (e.g., towards their visual field defect [Kasneci et al. 2014b]). Thus, a better understanding of visual perception mechanisms, i.e., of why and how we attend certain parts of our environment while "ignoring" others, is a key question to helping visually impaired persons in complex, real-life tasks, such as driving a car.

The role of processing fluency in online consumer behavior: evaluating fluency by tracking eye movements

The Internet enables people to extensively research products or services, and also easily compare prices between offers [e.g. Baker et al. 2001]. Taking into account the amount of information available on the Internet, acquisition of new information can face some difficulties, especially when one wants to make a purchase decision. Therefore, the ability to process relevant information fluently enables a user to create a better experience and to become more efficient in gathering information related to the purpose of the visit. This ability might be connected to the cognitive task that can either be effortless or effortful, and may lead to a metacognitive experience of either fluency or disfluency [Alter and Oppenheimer 2009]. Nevertheless, some e-commerce websites are preferred over others and this preference varies between individuals. This variation can be influenced by user's prior experience, cognitive sources but also graphics or information architecture on the web page. Presented project aims at applying the fluency concept to consumer behavior in online environment by studying eye movements and promoting eye tracking as an objective measure.

The use of eye-tracking in landscape perception research

The European Landscape Convention defines landscape as "an area, as perceived by people, whose character is the result of the action and interaction of natural and/or human factors" [Council of Europe 2000]. This definition puts people in the core of the landscape and makes them part of it while observing the landscape. In addition, the Convention emphasizes that landscape is an important public interest which determines a part of the quality of life for people everywhere. Consequently, an active participation of the public in landscape planning and management is strongly stimulated [Council of Europe 2000]. Regarding these statements, it would be beneficial to gain insights into people's observation and perception of landscapes to be able to use this knowledge for landscape planning and management. So far, different landscape perception paradigms have been formulated [Scott and Benson 2002] and analyzed using questionnaires and depth interviews. The most frequently used stimuli in these empirical researches are photographs or in situ observations [e.g. Ode et al. 2008; Palmer 2004; Tveit 2009]. Eye-tracking in combination with landscape photographs, however, offers an objective manner to measure people's observation of landscapes.

Towards visualizing eye movement data from interactive stimuli

Recording of eye movement data can help to understand where and at what participants look. However, analyzing eye movement data is a time consuming task. Using visualization techniques in the analysis process can help to uncover concealed relationships within the data and can therefore be seen as one means in the analysis of eye movement data. The most well known visualization techniques in eye tracking are heat maps or scanpaths. In recent years more visualization techniques have been developed, as for example scaled traces [Goldberg and Helfman 2010], eyePatterns [West et al. 2006], or eSeeTrack [Tsang et al. 2010].