We facilitate the comparative visual analysis of eye tracking data from multiple participants with a visualization that represents the temporal changes of viewing behavior. Common approaches to visually analyze eye tracking data either occlude or ignore the underlying visual stimulus, impairing the interpretation of displayed measures. We introduce fixation-image charts: a new technique to display the temporal changes of fixations in the context of the stimulus without visual overlap between participants. Fixation durations, the distance and direction of saccades between consecutive fixations, as well as the stimulus context can be interpreted in one visual representation. Our technique is not limited to static stimuli, but can be applied to dynamic stimuli as well. Using fixation metrics and the visual similarity of stimulus regions, we complement our visualization technique with an interactive filter concept that allows for the identification of interesting fixation sequences without the time-consuming annotation of areas of interest. We demonstrate how our technique can be applied to different types of stimuli to perform a range of analysis tasks. Furthermore, we discuss advantages and shortcomings derived from a preliminary user study.
Algorithms for eye movement classification are separated into threshold-based and probabilistic methods. While the parameters of static threshold-based algorithms usually need to be chosen for the particular task (task-individual), the probabilistic methods were introduced to meet the challenge of adjusting automatically to multiple individuals with different viewing behaviors (inter-individual). In the context of conditionally automated driving, especially while the driver is performing various secondary tasks, these two requirements of task- and inter-individuality fuse to an even greater challenge. This paper shows how the combination of task- and inter-individual differences influences the viewing behavior of a driver during conditionally automated drives and that state-of-the-art algorithms are not able to sufficiently adapt to these variances. To approach this challenge, an extended version of a Bayesian online learning algorithm is introduced, which is not only able to adapt its parameters to upcoming variances in the viewing behavior, but also has real-time capability and lower computational overhead. The proposed approach is applied to a large-scale driving simulator study with 74 subjects performing secondary tasks while driving in an automated setting. The results show that the eye movement behavior of drivers performing different secondary tasks varies significantly while remaining approximately consistent for idle drivers. Furthermore, the data shows that only a few of the parameters used for describing the eye movement behavior are responsible for these significant variations indicating that it is not necessary to learn all parameters in an online-fashion.
Human image understanding is reflected by individuals' visual and linguistic behaviors, but the meaningful computational integration and interpretation of their multimodal representations remain a challenge. In this paper, we expand a framework for capturing image-region annotations in dermatology, a domain in which interpreting an image is influenced by experts' visual perception skills, conceptual domain knowledge, and task-oriented goals. Our work explores the hypothesis that eye movements can help us understand experts' perceptual processes and that spoken language descriptions can reveal conceptual elements of image inspection tasks. We cast the problem of meaningfully integrating visual and linguistic data as unsupervised bitext alignment. Using alignment, we create meaningful mappings between physicians' eye movements, which reveal key areas of images, and spoken descriptions of those images. The resulting alignments are then used to annotate image regions with medical concept labels. Our alignment accuracy exceeds baselines using both exact and delayed temporal correspondence. Additionally, comparison of alignment accuracy between a method that identifies clusters in the images based on eye movement vs. a method that identifies clusters using image features suggests that the two approaches perform well on different types of images and concept labels. This suggests that an image annotation framework should integrate information from more than one technique to handle heterogeneous images. We also investigate the performance of the proposed aligner for dermatological primary morphology concept labels, as well as for lesion size or type and distribution-based categories of images.
From the seminal work of Yarbus  on the relationship of eye movements to vision, scanpath analysis has been recognized as a window into the mind. Computationally, characterizing the scanpath, the sequential and spatial dependencies between eye positions, has been demanding. We sought a method that could extract scanpath trajectory information from raw eye movement data without assumptions defining fixations and regions of interest. We adapted a set of libraries that perform multidimensional clustering on geometric features derived from large volumes of spatiotemporal data to eye movement data in an approach we call GazeAppraise. To validate the capabilities of GazeAppraise for scanpath analysis, we collected eye tracking data from 41 participants while they completed four smooth pursuit tracking tasks. Unsupervised cluster analysis on the features revealed that 162 of 164 recorded scanpaths were categorized into one of four clusters and the remaining two scanpaths were not categorized (recall/sensitivity=98.8%). All of the categorized scanpaths were grouped only with other scanpaths elicited by the same task (precision=100%). GazeAppraise offers a unique approach to the categorization of scanpaths that may be particularly useful in dynamic environments and in visual search tasks requiring systematic search strategies.
Video-based eye trackers (VETs) have become the dominant eye tracking technology due to its reasonable cost, accuracy, and easy of use. VETs require real-time image processing to detect and track eye features such as the center of the pupil and corneal reflection to estimate the point of regard. Despite the continuous evolution of cameras and computers that made head mounted eye trackers easier to use in natural activities, real-time processing of high resolution images in mobile devices remains a challenge. In this paper we investigate the feasibility of a novel eye-tracking technique intended for wearable applications that use mice chips as imaging sensors. Such devices are widely available at very low cost, and provide high speed and accurate 2D tracking data. Though mice chips have been used for many purposes other than a computer's pointing device, to our knowledge this is the first attempt to use it as an eye tracker. To validate the technique, we built an episcleral database with about 100 high resolution episcleral patches from 7 individuals. The episclera is the outer most layer of the sclera, which is the white part of the eye, and consists of dense vascular connective tissue. We have used the patches to determine if the episclera contains enough texture to be reliably tracked. We also present results from a prototype built using an off-the-shelf mouse sensor. Our results show that a mouse-based eye tracker has the potential to be very accurate, precise, and fast (measuring 2.1' of visual angle at 1 KHz speed), with little overhead for the wearable computer.
The existing eye trackers typically require an explicit personal calibration procedure to estimate subject-dependent eye parameters. Despite efforts in simplifying the calibration process, such a calibration process remains unnatural and bothersome, in particular for users of personal and mobile devices. To alleviate this problem, we introduce a technique that can eliminate explicit personal calibration. Based on combining a new calibration procedure with the eye fixation prediction, the proposed method performs implicit personal calibration without active participation or even knowledge of the user. Specifically, different from traditional deterministic calibration procedure that minimizes the differences between the predicted eye gazes and the actual eye gazes, we introduce a stochastic calibration procedure that minimizes the differences between the probability distribution of the predicted eye gaze and the distribution of the actual eye gaze. Furthermore, instead of using saliency map to approximate eye fixation distribution, we propose to use a regression based deep convolutional neural network (RCNN) that specifically learns image features to predict eye fixation. By combining the distribution based calibration with the deep fixation prediction procedure, personal eye parameters can be estimated without explicit user collaboration. We apply the proposed method to both 2D regression-based and 3D model-based eye gaze tracking methods. Experimental results show that the proposed method outperforms other implicit calibration methods and achieve comparable results to those that use traditional explicit calibration methods.
This paper addresses gaze interaction for smart home control, conducted from a wrist-worn unit. First we asked ten people to enact the gaze movements they would propose for e.g. opening a door or adjusting the room temperature. On basis of their suggestions we built and tested different versions of a prototype applying off-screen stroke input. Command prompts were given to twenty participants by text or arrow displays. The success rate achieved by the end of their first encounter with the system was 46% in average; it took them 1.28 seconds to connect with the system and 1.29 seconds to make a correct selection. Their subjective evaluations were positive with regard to the speed of the interaction. We conclude that gaze gesture input seems feasible for fast and brief remote control of smart home technology provided that robustness of tracking is improved.
During eye-tracking studies there is a possibility for the actual fixation to shift a little when recorded. The cause of this shift could be due to various reasons such as the accuracy of the calibration or drift. Researchers usually correct fixations manually. Manual corrections are error prone especially if done on large samples for extended periods. There is also no guarantee that two corrections done by different people on the same data set will be consistent with each other. In order to solve this problem, we introduce an attempt at automatically correcting fixations that uses a variable offset for groups of fixations. Our focus is on source code, which is read differently than natural language requiring an algorithm that adapts to these differences. We introduce a Hill Climbing algorithm that shifts fixations to a best-fit location based on a scoring function. In order to evaluate the algorithm's effectiveness, we compare the automatically corrected fixations against a set of manually corrected ones, giving us an accuracy of 89%. These findings are discussed with additional ways to improve the algorithm.
We detail the design and evaluation of a rotary interface for gaze-based PIN code entry. Interface design promotes equal distance between PIN numerals, leading to a circular layout resulting in the choice of a rotary telephone dial metaphor. The rotary's speed advantage over the traditional grid-based (e.g., keypad) design is derived from its elimination of dwell time for gaze-based numeral selection, relying instead on a weighted voting scheme of numerals whose boundaries are crossed by the streaming (smoothed) gaze points. Screen center-bias is exploited by requiring users to transition to and from the center to the rotary numerals, in order to enter in the PIN code sequence. Compared with the keypad layout, empirical results show that PIN digit entry errors do not differ significantly between interfaces, although the rotary incurs fewer errors overall. Expressing preference for the rotary, users appeared to quickly grasp its operation.
When modeling natural conversational behavior of an agent, a head direction becomes an intuitive proxy to visual attention. We examine this assumption and carefully investigate the relationship between head directions and gaze dynamics through the use of eye-movement tracking. In a group conversation settings, we analyze relationships of the two nonverbal social signals - head directions and gaze dynamics - linked to influential and non-influential statements. We develop a clustering method to estimate the number of gaze targets. We employ this method to show that head and gaze dynamic behaviors are not correlated, and thus head cannot be used as a direct proxy to a person's gaze in the context of conversations. We also describe in detail how influential statements affect head and gaze behaviors. The findings have implications on methodology, modeling and design of natural conversational agents and present a supportive evidence for employing gaze-tracking into the future conversational technologies.
The focus of this study is on wayfinding in large complex buildings with different wayfinding devices. The interaction of pedestrians of such devices is always also interplay with the surrounding environment and its specific features. Furthermore different wayfinding assistances can elicit different needs for additional information from the environment to make accurate choices at decision points. We aim to shed light on how characteristics of decision points in combination with different wayfinding devices shape wayfinders' visual attention. 60 participants individually looked for three destinations in the same order. They navigated with 1) a printed map, 2) a digital map, or 3) without a map, only using full-coverage numeric signage. To gain first insights fixation frequencies on maps and signage as well as the correct and incorrect route options were recorded with a mobile eyetracker and analyzed for 28 decision points and four decision point categories. The results indicated that starting points play a special role in planning the route ahead. Furthermore points that allow for a floor change lead to a higher attention and information search.
Heat maps, or more generally, attention maps or saliency maps are an often used technique to visualize eye-tracking data. With heat maps qualitative information about visual processing can be easily visualized and communicated between experts and laymen. They are thus a versatile tool for many disciplines, in particular for usability engineering, and are often used to get a first overview about recorded eye-tracking data.
Today, heat maps are typically generated for 2D stimuli that have been presented on a computer display. In such cases the mapping of overt visual attention on the stimulus is rather straight forward and the process is well understood. However, when turning towards mobile eye tracking and eye tracking in 3D virtual environments, the case is much more complicated.
In the first part of the paper, we discuss several challenges that have to be considered in 3D environments, such as changing perspectives, multiple viewers, object occlusions, depth of fixations, or dynamically moving objects. In the second part, we present an approach for the generation of 3D heat maps addressing the above mentioned issues while working in real-time. Our visualizations provide high-quality output for multi-perspective eye-tracking recordings of visual attention in 3D environments.
The number of users required for usability studies has been a controversial issue over 30 years. Some researchers suggest a certain number of users to be included in these studies. However, they do not focus on eye tracking studies for analysing eye movement sequences of users (i.e., scanpaths) on web pages. We investigate the effects of the number of users on scanpath analysis with our algorithm that was designed for identifying the most commonly followed path by multiple users. Our experimental results suggest that it is possible to approximate the same results with a smaller number of users. The results also suggest that more users are required when they serendipitously browse on web pages in comparison with when they search for specific information or items. We observed that we could achieve 75% similarity to the results of 65 users with 27 users for searching tasks and 34 users for browsing tasks. This study guides researchers to determine the ideal number of users for analysing scanpaths on web pages based on their budget and time.
In eye tracking studies a complex visual stimulus requires the definition of many areas of interest (AOIs). Often these AOIs have an inherent, nested hierarchical structure that can be utilized to facilitate analysis tasks. We discuss how this hierarchical AOI structure in combination with appropriate visualization techniques can be applied to analyze fixation sequences on differently aggregated levels. An AOI View, AOI Tree, AOI Matrix, and AOI Graph enable a bottom-up and top-down evaluation of fixation sequences. We conducted an expert review and compared our techniques to current state-of-the-art visualization techniques in eye movement research to further improve and extend our approach. To show how our approach is used in practice, we evaluate fixation sequences collected during a study where 101 AOIs are organized hierarchically.
Visual inspection of medical imagery such as MRI and CT scans is a major task for medical professionals who must diagnose and treat patients without error. Given this goal, visualizing search behavior patterns used to recognize abnormalities in these images is of interest. In this paper we describe the development of a system which automatically generates multiple image-dependent heat maps from eye gaze data of users viewing medical image slices. This system only requires the use of a non-wearable eye gaze tracker and video capturing system. The main automated features are the identification of a medical image slice located inside a video frame and calculation of the correspondence between display screen and raw image eye gaze locations. We propose that the system can be used for eye gaze analysis and diagnostic training in the medical field.
Fast and robust pupil detection is an essential prerequisite for video-based eye-tracking in real-world settings. Several algorithms for image-based pupil detection have been proposed in the past, their applicability, however, is mostly limited to laboratory conditions. In real-world scenarios, automated pupil detection has to face various challenges, such as illumination changes, reflections (on glasses), make-up, non-centered eye recording, and physiological eye characteristics. We propose ElSe, a novel algorithm based on ellipse evaluation of a filtered edge image. We aim at a robust, inexpensive approach that can be integrated in embedded architectures, e.g., driving. The proposed algorithm was evaluated against four state-of-the-art methods on over 93,000 hand-labeled images from which 55,000 are new eye images contributed by this work. On average, the proposed method achieved a 14.53% improvement on the detection rate relative to the best state-of-the-art performer. Algorithm and data sets are available for download: ftp://email@example.com (password:eyedata).
Learning-based methods for appearance-based gaze estimation achieve state-of-the-art performance in challenging real-world settings but require large amounts of labelled training data. Learning-by-synthesis was proposed as a promising solution to this problem but current methods are limited with respect to speed, appearance variability, and the head pose and gaze angle distribution they can synthesize. We present UnityEyes, a novel method to rapidly synthesize large amounts of variable eye region images as training data. Our method combines a novel generative 3D model of the human eye region with a real-time rendering framework. The model is based on high-resolution 3D face scans and uses real-time approximations for complex eyeball materials and structures as well as anatomically inspired procedural geometry methods for eyelid animation. We show that these synthesized images can be used to estimate gaze in difficult in-the-wild scenarios, even for extreme gaze angles or in cases in which the pupil is fully occluded. We also demonstrate competitive gaze estimation results on a benchmark in-the-wild dataset, despite only using a light-weight nearest-neighbor algorithm. We are making our UnityEyes synthesis framework available online for the benefit of the research community.
We present labelled pupils in the wild (LPW), a novel dataset of 66 high-quality, high-speed eye region videos for the development and evaluation of pupil detection algorithms. The videos in our dataset were recorded from 22 participants in everyday locations at about 95 FPS using a state-of-the-art dark-pupil head-mounted eye tracker. They cover people of different ethnicities and a diverse set of everyday indoor and outdoor illumination environments, as well as natural gaze direction distributions. The dataset also includes participants wearing glasses, contact lenses, and make-up. We benchmark five state-of-the-art pupil detection algorithms on our dataset with respect to robustness and accuracy. We further study the influence of image resolution and vision aids as well as recording location (indoor, outdoor) on pupil detection performance. Our evaluations provide valuable insights into the general pupil detection problem and allow us to identify key challenges for robust pupil detection on head-mounted eye trackers.
While for the evaluation of robustness of eye tracking algorithms the use of real-world data is essential, there are many applications where simulated, synthetic eye images are of advantage. They can generate labelled ground-truth data for appearance based gaze estimation algorithms or enable the development of model based gaze estimation techniques by showing the influence on gaze estimation error of different model factors that can then be simplified or extended. We extend the generation of synthetic eye images by a simulation of refraction and reflection for eyeglasses. On the one hand this allows for the testing of pupil and glint detection algorithms under different illumination and reflection conditions, on the other hand the error of gaze estimation routines can be estimated in conjunction with different eyeglasses. We show how a polynomial function fitting calibration performs equally well with and without eyeglasses, and how a geometrical eye model behaves when exposed to glasses.
A convolution-filtering technique is introduced for the synthesis of eye gaze data. Its purpose is to produce, in a controlled manner, a synthetic stream of raw gaze position coordinates, suitable for: (1) testing event detection filters, and (2) rendering synthetic eye movement animations for testing eye tracking gaze estimation algorithms. Synthetic gaze data is parameterized by sampling rate, microsaccadic jitter, and simulated measurement error. Sampled synthetic gaze data is compared against real data captured by an eye tracker showing similar signal characteristics.
In viewing an image or real-world scene, different observers may exhibit different viewing patterns. This is evidently due to a variety of different factors, involving both bottom-up and top-down processing. In the literature addressing prediction of visual saliency, agreement in gaze patterns across observers is often quantified according to a measure of inter-observer congruency (IOC). Intuitively, common viewership patterns may be expected to diagnose certain image qualities including the capacity for an image to draw attention, or perceptual qualities of an image relevant to applications in human computer interaction, visual design and other domains. Moreover, there is value in determining the extent to which different factors contribute to inter-observer variability, and corresponding dependence on the type of content being viewed. In this paper, we assess the extent to which different types of features contribute to variability in viewing patterns across observers. This is accomplished in considering correlation between image derived features and IOC values, and based on the capacity for more complex feature sets to predict IOC based on a regression model. Experimental results demonstrate the value of different feature types for predicting IOC. These results also establish the relative importance of top-down and bottom-up information in driving gaze and provide new insight into predictive analysis for gaze behavior associated with perceptual characteristics of images.
Smooth pursuit eye movements provide meaningful insights and information on subject's behavior and health and may, in particular situations, disturb the performance of typical fixation/saccade classification algorithms. Thus, an automatic and efficient algorithm to identify these eye movements is paramount for eye-tracking research involving dynamic stimuli. In this paper, we propose the Bayesian Decision Theory Identification (I-BDT) algorithm, a novel algorithm for ternary classification of eye movements that is able to reliably separate fixations, saccades, and smooth pursuits in an online fashion, even for low-resolution eye trackers. The proposed algorithm is evaluated on four datasets with distinct mixtures of eye movements, including fixations, saccades, as well as straight and circular smooth pursuits; data was collected with a sample rate of 30 Hz from six subjects, totaling 24 evaluation datasets. The algorithm exhibits high and consistent performance across all datasets and movements relative to a manual annotation by a domain expert (recall: μ = 91.42%, σ = 9.52%; precision: μ = 95.60%, σ = 5.29%; specificity μ = 95.41%, σ = 7.02%) and displays a significant improvement when compared to I-VDT, an state-of-the-art algorithm (recall: μ = 87.67%, σ = 14.73%; precision: μ = 89.57%, σ = 8.05%; specificity μ = 92.10%, σ = 11.21%). Algorithm implementation and annotated datasets are openly available at www.ti.uni-tuebingen.de/perception
3D image segmentation is a fundamental process in many scientific and medical applications. Automatic algorithms do exist, but there are many use cases where these algorithms fail. The gold standard is still manual segmentation or review. Unfortunately, even for an expert this is laborious, time consuming, and prone to errors. Existing 3D segmentation tools do not currently take into account human mental models and low-level perception tasks. Our goal is to improve the quality and efficiency of manual segmentation and review by analyzing how experts perform segmentation. As a preliminary step we conducted a field study with 8 segmentation experts, recording video and eye tracking data. We developed a novel coding scheme to analyze this data and verified that it successfully covers and quantifies the low-level actions, tasks and behaviors of experts during 3D image segmentation.
In this short paper, we present a lightweight application for the interactive annotation of eye tracking data for both static and dynamic stimuli. The main functionality is the annotation of fixations that takes into account the scanpath and stimulus. Our visual interface allows the annotator to work through a sequence of fixations, while it shows the context of the scanpath in the form of previous and subsequent fixations. The context of the stimulus is included as visual overlay. Our application supports the automatic initial labeling according to areas of interest (AOIs), but is not dependent on AOIs. The software is easily configurable, supports user-defined annotation schemes, and fits in existing workflows of eye tracking experiments and the evaluation thereof by providing import and export functionalities for data files.
With the launch of ultra-portable systems, mobile eye tracking finally has the potential to become mainstream. While eye movements on their own can already be used to identify human activities, such as reading or walking, linking eye movements to objects in the environment provides even deeper insights into human cognitive processing.
We present a model-based approach for the identification of fixated objects in three-dimensional environments. For evaluation, we compare the automatic labelling of fixations with those performed by human annotators. In addition to that, we show how the approach can be extended to support moving targets, such as individual limbs or faces of human interaction partners. The approach also scales to studies using multiple mobile eye-tracking systems in parallel.
The developed system supports real-time attentive systems that make use of eye tracking as means for indirect or direct human-computer interaction as well as off-line analysis for basic research purposes and usability studies.
3D gaze information is important for scene-centric attention analysis, but accurate estimation and analysis of 3D gaze in real-world environments remains challenging. We present a novel 3D gaze estimation method for monocular head-mounted eye trackers. In contrast to previous work, our method does not aim to infer 3D eyeball poses, but directly maps 2D pupil positions to 3D gaze directions in scene camera coordinate space. We first provide a detailed discussion of the 3D gaze estimation task and summarize different methods, including our own. We then evaluate the performance of different 3D gaze estimation approaches using both simulated and real data. Through experimental validation, we demonstrate the effectiveness of our method in reducing parallax error, and we identify research challenges for the design of 3D calibration procedures.
We present a novel pipeline for localizing a free roaming eye tracker within a LiDAR-based 3D reconstructed scene with high levels of accuracy. By utilizing a combination of reconstruction algorithms that leverage the strengths of global versus local capture methods and user-assisted refinement, we reduce drift errors associated with Dense-SLAM techniques. Our framework supports region-of-interest (ROI) annotation and gaze statistics generation and the ability to visualize gaze in 3D from an immersive first person or third person perspective. This approach gives unique insights into viewers' problem solving and search task strategies and has high applicability in complex static environments such as crime scenes.
We are developing a cooking support system that coaches beginners. In this work, we focus on eye movement patterns while cooking meals because gaze dynamics include important information for understanding human behavior. The system first needs to classify typical cooking operations. In this paper, we propose a gaze-based classification method and evaluate whether or not the eye movement patterns have a potential to classify the cooking operations. We improve the conventional N-gram model of eye movement patterns, which was designed to be applied for recognition of office work. Conventionally, only relative movement from the previous frame was used as a feature. However, since in cooking, users pay attention to cooking ingredients and equipments, we consider fixation as a component of the N-gram. We also consider eye blinks, which is related to the cognitive state. Compared to the conventional method, instead of focusing on statistical features, we consider the ordinal relations of fixation, blink, and the relative movement. The proposed method estimates the likelihood of the cooking operations by Support Vector Regression (SVR) using frequency histograms of N-grams as explanatory variables.
Existing literature reveals that during reading, gaze fixations within a word are not necessarily close to the optimal reading position. The optimal position for reading represents the gaze fixation near the center of each word for which recognition time is minimal. In our study, we examined the match between optimal position and initial fixation position, which has not been studied before. We did this in conventional reading and in two text-viewing tasks. For the text-viewing tasks, we employed multiple words that were non-isolated in the presentation but could be examined individually in rhythm and linguistic context. We discovered that for the text-viewing tasks, the initial fixation position tends to be close to the optimal position, which was in contrast to the conventional reading task. This finding will help us understand the relationship between optimal position and reading context and inform the development of new reading applications.
Mobile gaze-based interaction has been emerging over the last two decades. Head-mounted eye trackers as well as remote systems are used to determine people's gaze (e.g., on a display). However, most state-of-the-art systems need calibration prior to usage. When using a head-mounted eye tracker, many factors (e.g., changes of eye physiology) can influence the stability of the calibration leading to less accuracy over time. Re-calibrating the system at certain time intervals is cumbersome and time-consuming. We investigate methods to minimize the time needed and optimize the process. In a user study with 16 participants, we compared partial re-calibrations with different numbers of calibration points and types of adaptation strategies. In contrast to a full calibration with nine points, the results show that a re-calibration with only three points results in 60% less time needed and achieves a similar accuracy.
Characterizing noise in eye movement data is important for data analysis, as well as for the comparison of research results across systems. We present a method that characterizes and reconstructs the noise in eye movement data from video-oculography (VOG) systems taking into account the uneven sampling in real recordings due to track loss and inherent system features. The proposed method extends the Lomb-Scargle periodogram, which is used for the estimation of the power spectral density (PSD) of unevenly sampled data [Hocke and Kämpfer 2009]. We estimate the PSD of fixational eye movement data and reconstruct the noise by applying a random phase to the inverse Fourier transform so that the reconstructed signal retains the amplitude of the original noise at each frequency. We apply this method to the EMRA/COGAIN Eye Data Quality Standardization project's dataset, which includes recordings from 11 commercially available VOG systems and a Dual Pukinje Image (DPI) eye tracker. The reconstructed noise from each VOG system was superimposed onto the DPI data and the resulting eye movement measures from the same original behaviors were compared.
To address the need for portable systems to collect high-quality eye movement data for field studies, this paper shows how one might design, test, and validate the spatiotemporal fidelity of a homebrewed eye-tracking system. To assess spatial and temporal precision, we describe three validation tests that quantify the spatial resolution and temporal synchronization of data acquisition. First, because measurement of pursuit eye movements requires a visual motion display, we measured the timing of luminance transitions of several candidate LCD monitors so as to ensure sufficient stimulus fidelity. Second, we measured eye position as human observers (n=20) ran a nine-point calibration in a clinical-grade chin rest, delivering eye-position noise of 0.22 deg (range: 0.09-0.29 deg) and accuracy of 0.97 deg (range: 0.54-1.89 deg). Third, we measured the overall processing delay in the system to be 5.6 ms, accounted for by the response dynamics of our monitor and the duration of one camera frame. The validation methods presented can be used: 1) to ensure that eye-position accuracy and precision are sufficient to support scientific and clinical studies and are not limited by the hardware or software, and 2) the eyetracker, display, and experiment-control software are effectively synchronized.
Cross-ratio (CR)-based eye tracking has been attracting much interest due to its simple setup, yet its accuracy is lower than that of the model-based approaches. In order to improve the estimation accuracy, a multi-camera setup can be exploited rather than the traditional single camera systems. The overall gaze point can be computed by fusion of available gaze information from all cameras. This paper presents a real-time multi-camera eye tracking system in which the estimation of gaze relies on simple CR geometry. A novel weighted fusion method is proposed, which leverages the user calibration data to learn the fusion weights. Experimental results conducted on real data show that the proposed method achieves a significant accuracy improvement over single camera systems. The real-time system achieves 0.82° of visual angle accuracy error with very few calibration data (5 points) under natural head movements, which is competitive with more complex model-based systems.
Interpolation-based methods are widely used for gaze estimation due to their simplicity. In particular, feature-based methods that map the image eye features to gaze, are very popular. The most spread regression function used in this kind of method is the polynomial regression. In this paper, we present an alternative regression function to estimate gaze: the Gaussian regression. We show how the Gaussian processes can better adapt to the non-linear behavior of the eye movement, providing higher gaze estimation accuracies. The Gaussian regression is compared, in a simulated environment, to the polynomial regression, when using the same mapping features, the normalized pupil center-corneal reflection and pupil center-eye corners vectors. This comparison is done for three different screen sizes. The results show that for larger screens, where wider gaze angles are required, i.e., the non-linear behavior of the eye is more present, the outperformance of the Gaussian regression is more evident. Furthermore, we can conclude that, for both types of regressions, the gaze estimation accuracy increases for smaller screens, where the eye movements are more linear.
Recent developments in eye tracking technology are paving the way for gaze-driven interaction as the primary interaction modality. Despite successful efforts, existing solutions to the "Midas Touch" problem have two inherent issues: 1) lower accuracy, and 2) visual fatigue that are yet to be addressed. In this work we present GAWSCHI: a Gaze-Augmented, Wearable-Supplemented Computer-Human Interaction framework that enables accurate and quick gaze-driven interactions, while being completely immersive and hands-free. GAWSCHI uses an eye tracker and a wearable device (quasi-mouse) that is operated with the user's foot, specifically the big toe. The system was evaluated with a comparative user study involving 30 participants, with each participant performing eleven predefined interaction tasks (on MS Windows 10) using both mouse and gaze-driven interactions. We found that gaze-driven interaction using GAWSCHI is as good (time and precision) as mouse-based interaction as long as the dimensions of the interface element are above a threshold (0.60" x 0.51"). In addition, an analysis of NASA Task Load Index post-study survey showed that the participants experienced low mental, physical, and temporal demand; also achieved a high performance. We foresee GAWSCHI as the primary interaction modality for the physically challenged and a means of enriched interaction modality for the able-bodied demographics.
Most computer systems require user authentication, which has led to an increase in the number of passwords one has to remember. In this paper we explore if spatial visual cues can be used to improve password recollection. Specifically, we consider if associating each character in a password to user-defined spatial regions in an image facilitates better recollection. We conduct a user study where participants were asked to recall randomly generated numeric passwords under the following conditions: no image association (No-Image), image association (Image-Only), image association combined with overt visual cues (Overt-Guidance), and image association combined with subtle visual cues (Subtle-Guidance). We measured the accuracy of password recollection and response time as well as average dwell-time at target locations for the gaze guided conditions. Subjects performed significantly better on password recollection when they were actively guided to regions in the associated image using overt visual cues. Accuracy of password recollection using subtle cues was also higher than the No-Image and Image-Only conditions, but the effect was not significant. No significant difference was observed in the average dwell-times between the overt and subtle guidance approaches.
Real-time moving target acquisition in full motion video is a challenging task. Mouse input might fail if targets move fast, unpredictably, or are only visible for a short period of time. In this paper, we describe an experiment with expert video analysts (N=26) which perform moving target acquisition by selecting targets in a full motion video sequence presented on a desktop computer. The results show that using gaze input (gaze pointing + manual key press), the participants were able to perform with significantly shorter completion times than with mouse input. Error rates (represented by target misses) and acquisition precision were similar. Subjective ratings of user satisfaction resulted in similar or even better scores for the gaze interaction.
The paper presents the algorithm supporting an implicit calibration of eye movement recordings. The algorithm does not require any explicit cooperation from users, yet it uses only information about a stimulus and an uncalibrated eye tracker output. On the basis of this data, probable fixation locations are calculated at first. Such a fixation set is used as an input to the genetic algorithm which task is to choose the most probable targets. Both information can serve to calibrate an eye tracker. The main advantage of the algorithm is that it is general enough to be used for almost any stimulation. It was confirmed by results obtained for a very dynamic stimulation which was a shooting game. Using the calibration function built by the algorithm it was possible to predict where a user will click with a mouse. The accuracy of the prediction was about 75%.
We propose a implicit calibration method for estimating the offset between the optical and visual axes without active participation of a user. The method relies on a fully calibrated setup and uses the relation between the optical axes of both eyes, the position of the center of the cornea, and the position of the display. The method is based on the assumption that the visual axes of both the eyes coincide on the display. The implicit calibration method estimates the offsets by continuously estimating the angle kappa through kernel density estimation. From the numerical simulation, the accuracy of our method is comparable or better than existing methods for implicit calibration.
This study presents the results of a longitudinal study on multimodal text entry where objects were selected by gazing and smiling. Gaze was used to point at the desired characters and smiling movements were performed to select them. Participants (N=12) took part in the experiments where they entered text for a total of 2.5 hours in ten 15-minute-long sessions during one-month time period. The results showed that the text entry rate improved with practice from 4.1 to 6.7 words per minute. However, the learning curve had not reached its plateau phase at the end of the experiment. Subjective ratings showed that the participants appreciated this multimodal technique.
This work explores gaze-based interaction for moving target acquisition. In a pilot study, three interaction techniques are compared: gaze and manual button press (gaze + hand), gaze and foot button press (gaze + foot), and traditional mouse input. In a controlled scenario using a circle acquisition paradigm, participants perform moving target acquisition for targets differing in speed, direction of motion and motion pattern. The results show similar hit rates for the three techniques. Target acquisition completion time is significantly faster for the gaze-based techniques compared to mouse input.
The human gait cycle is incredibly efficient and stable largely because of the use of advance visual information to make intelligent selections of heading direction, foot placement, gait dynamics, and posture when faced with terrain complexity [Patla and Vickers 1997; Patla and Vickers 2003; Matthis and Fajen 2013; Matthis and Hayhoe 2015]. This is behaviorally demonstrated by a coupling between saccades and foot placement.
A possible way to make video chat more efficient is to only send video frames that are likely to be looked at by the remote participant. Gaze in dialog is intimately tied to dialog states and behaviors, so prediction of such times should be possible. To investigate, we collected data on both participants in 6 video-chat sessions, totalling 65 minutes, and created a model to predict whether a participant will be looking at the screen 300 milliseconds in the future, based on prosodic and gaze information available at the other side. A simple predictor had a precision of 42% at the equal error rate. While this is probably not good enough to be useful, improved performance should be readily achievable.
In typical Multiple Object Tracking (MOT) paradigm, the participant's task is to track targets amongst distractors for several seconds. Understanding gaze strategies in MOT can help us reveal attentional mechanisms in dynamic tasks. Previous attempts relied on analytical strategies (such as averaging object positions). An alternative approach is to find this relationship using machine learning technique. After preprocessing, we assembled a dataset with 48,000 datapoints, representing 1534 MOT trials or 2.5 hours. In this study, we used feedforward neural networks to predict gaze position and compared predicted gaze with analytical strategies from previous studies using median distance. Our results showed that neural networks were able to predict eye positions better than current strategies. Particularly, they performed better when we trained the network with all objects, not targets only. It supports the hypothesis that people are influenced by distractor positions during tracking.
Gaze estimation error is inherent in head-mounted eye trackers and seriously impacts performance, usability, and user experience of gaze-based interfaces. Particularly in mobile settings, this error varies constantly as users move in front and look at different parts of a display. We envision a new class of gaze-based interfaces that are aware of the gaze estimation error and adapt to it in real time. As a first step towards this vision we introduce an error model that is able to predict the gaze estimation error. Our method covers major building blocks of mobile gaze estimation, specifically mapping of pupil positions to scene camera coordinates, marker-based display detection, and mapping of gaze from scene camera to on-screen coordinates. We develop our model through a series of principled measurements of a state-of-the-art head-mounted eye tracker.
Face-space has become established as an effective model for representing the dimensions of variation that occur in collections of human faces. For example, a change of expression from neutral to smiling can be represented by one axis in a face space. Principal components can be used to determine the axes of a face-space, however, standard principal components are based entirely on the data set from which they are computed, and do not express any domain specific information about the application of interest. In this paper, we propose a face-space analysis that combines the variance criterion used in principal components with some prior knowledge about the task-driven experiment. The priors are based on measuring eye movements of participants to frontal 2D faces during separate gender and facial expression categorization tasks. Our findings show that saccades to faces are task-driven, especially from 500 to 1000 milliseconds, and automatic recognition performance does not improve with additional exposure time.
We investigate the gaze estimation error induced by pupil size changes using simulated data. We investigate the influence of pupil diameter changes on estimated gaze point error obtained by two gaze estimation models. Simulation data show that at wider viewing angles and at small eye-camera distances, error increases with increasing pupil sizes. The maximum error recorded for refracted pupil images is 2.4° of visual angle and 1.5° for non-refracted pupil projections.
In a study with 12 participants we compared two smooth pursuit based widgets and one dwell time based widget in adjusting a continuous value. The circular smooth pursuit widget was found to be about equally efficient as the dwell based widget in our color matching task. The scroll bar shaped smooth pursuit widget exhibited lower performance and lower user ratings.
To determine the relationship between brain activity and eye movements when activated by images of facial expressions, electroencephalograms (EEGs) and eye movements based on electrooculograms (EOGs) were measured and analyzed. Typical facial expressions from a photo database were grouped into two clusters by subjective evaluation and designated as either "Pleasant" or "Unpleasant" facial images. Regarding chronological analysis, the correlation coefficients of frequency powers between EEGs at a central area and eye movements monotonically increased throughout the time course when "Unpleasant" images were presented. Both the definite relationships and these dependencies on images of facial expressions were confirmed.
In this paper we investigate the utility of an eye-based interaction technique (EyeGrip) for seamless interaction with scrolling contents on eyewear computers. EyeGrip uses Optokinetic Nystagmus (OKN) eye movements to detect object of interest among a set of scrolling contents and automatically stops scrolling for the user. We empirically evaluated the usability of EyeGrip in two different applications for eyewear computers: 1) a menu scroll viewer and 2) a Facebook newsfeed reader. The results of our study showed that the EyeGrip technique performs as good as keyboard which has long been a well-known input device. Moreover, the accuracy of the EyeGrip method for menu item selection was higher while in the Facebook study participants found keyboard more accurate.
The eye gaze behavior of individuals changes depending on their knowledge and experience of the event occurring in their field of view. In past studies, researchers formulated a hypothesis concerning this dependency on a specific scene and then analyzed the gaze behavior of viewers observing the scene. We depart from this hypothesis-testing paradigm. In this paper, we propose a data-mining framework for extracting skilled gaze behaviors of experts while watching a video based on a comprehensive comparison of viewers in terms of the dependency of their gaze patterns on video scenes. To quantitatively analyze the changes in the gaze behavior of experts according to the events in the scene, video and eye movement sequences are classified into video scenes and gaze patterns, respectively, by using an unsupervised clustering method focusing on short-time dynamics. Then, we analyze the dependency based on the distinctiveness and occurrence frequency of gaze patterns for each video scene.
While many elaborate algorithms to classify eye movements into fixations and saccades exist, detection of smooth pursuit eye movements is still challenging. Smooth pursuits do not occur for the predominantly studied static stimuli; for dynamic stimuli, it is difficult to distinguish small gaze displacements due to noise from smooth pursuit. We propose to improve noise robustness by combining information from multiple recordings: if several people show similar gaze patterns that are neither fixations nor saccades, these episodes are likely smooth pursuits. We evaluated our approach against two baseline algorithms on a hand-labelled subset of the GazeCom data set of dynamic natural scenes, using three different clustering algorithms to determine gaze similarity. Results show that our approach achieves a very substantial increase in precision at improved recall over state-of-the-art algorithms that consider individual gaze traces only.
Far infrared thermography, which can be used to detect thermal radiation emitted by humans, has been used to detect physical disease, physiological changes relating to emotion, and polygraph testing, but has not been used for eye tracking. However, because the surface temperature of the cornea is colder than the limbus, it is theoretically possible to track corneal movements through thermal imaging. To explore the feasibility of thermal eye tracking, we invited 10 adults and tracked their corneal movements with passive thermal imaging at 60 Hz. We combined shape models of eyes with intensity threshold to segment the cornea from other parts of the eye in thermal images. We used an animation sequence as a calibration target for 5 point calibration/validation 5 times. Our results were compared to simultaneously collected data using an SR EyeLink eye tracker at 500 Hz, demonstrating the feasibility of eye tracking with thermal images. Blinking and breathing frequencies, which reflect the psychophysical status of the participants, were also robustly detected during thermal eye tracking.
With this demo we show a new application design for analyzing eye tracking experiments following the visual analytics approach. This application design allows users to analyze large eye tracking data sets efficiently and to find interesting patterns in eye movement data as well as correlations between eye movements and other data streams. We describe the main characteristics of the implemented visualizations and pattern recognition algorithms, present the interaction concept and demonstrate the main analysis features in a use case concerning the development of a driver assistance system.
The human eye offers a fascinating window into an individual's health, cognitive attention, and decision making, but we lack the ability to continually measure these parameters in the natural environment. We demonstrate CIDER, a system that operates in a highly optimized low-power mode under indoor settings by using a fast Search-Refine controller to track the eye, but detects when the environment switches to more challenging outdoor sunlight and switches models to operate robustly under this condition. Our design is holistic and tackles a) power consumption in digitizing pixels, estimating pupillary parameters, and illuminating the eye via near-infrared and b) error in estimating pupil center and pupil dilation. We demonstrate that CIDER can estimate pupil center with error less than two pixels (0.6°), and pupil diameter with error of one pixel (0.22mm). Our end-to-end results show that we can operate at power levels of roughly 7mW at a 4Hz eye tracking rate, or roughly 32mW at rates upwards of 250Hz.
This document describes a method for detecting the onset of eye fatigue and how it could be implemented in an existing live framework. The proposed method, which uses fixation data, does not rely as heavily on the sampling rate of the eye tracker as do methods which use saccade data, making it more suitable for lower cost eye trackers such as mobile and wearable devices. By being able to detect eye fatigue with such eye trackers, it becomes possible to react to the development of fatigue in virtually any environment, such as by alerting drivers that they appear fatigued and may want to pull over. It could also be used to aid in developing interfaces that are more user-friendly by noting at which point a user becomes fatigued while navigating the interface.
In previous studies, the angle kappa, the offset between the optical axis and visual axis, was considered to be calibrated when eye trackers based on a 3D model were used. However, we found that the angle kappa could be used as personal information, which is immeasurable from outside the human body. This paper proposes a concept for PIN entry by considering the characteristics of the angle kappa. Thus, we measured the distribution of the angle kappa and developed a prototype of the system. We demonstrated the effectiveness of the method.
FixFix is a web-based tool for editing reading gaze fixation datasets. The purpose is to provide gaze researchers focusing on reading an easy-to-use interface that will facilitate manual interpretation, but even more so to create gold standard datasets for machine learning and data mining. It allows the users to identify fixations, then move them either singly or in groups, in order to correct both variable and systematic gaze sampling errors.
Surgery is a team effort. In this video, we display how we record two surgeons' eye motions during a simulated surgical operation, then performed Cross Recurrence Analysis (CRA) on the dual eye-tracking data to develop a valid technology to assess shared cognition. Twenty-two dyad teams were recruited to perform object transportation task using laparoscopic techniques. Outputs from CRA, including overlapping, recurrence rate and phase delay were correlated with team performance measured by the task time, errors made, and movement de-synchronization. Gaze behaviors between the two team members recorded in the surgical videos correlated positively with team performance. Elite teams were overlapping gaze more with higher recurrence rate than the poor teams. Dual eye-tracking analysis can be a useful tool for assessing team cognition and evaluating the team training.
Task analysis using eye-tracking has previously been used for estimating cognitive load on a per-task basis. However, since pupil size is a continuous physiological signal, eye-based classification accuracy of cognitive load can be improved by analysing cognitive load at a finer temporal resolution and incorporating models of the interactions between the task-evoked pupillary response (TEPR) and other pupillary responses such as the pupillary light reflex into the classification model.
The possibility of characterising within-task transient behaviour of eye-activity to accurately measure continuous cognitive load will be investigated in this research. Subsequently pupil light reflex models will be incorporated into task analysis to investigate means of enhancing the reliability of cognitive load estimation in varied lighting conditions. This will culminate in the development and evaluation of a classification system which measures rapidly changing cognitive load. Task analysis of this calibre will augment the functionality of interfaces in wearable optical devices, for example by enabling them to control information flow to prevent information overload and interruptions.
The aim of the intended dissertation study is to show the diagnostic potential of eye tracking for a spatial thinking test. To this end, a structural overview of different analyzing techniques for eye tracking data will be provided using several measures. For a new developed test for the spatial cognitive ability visualization, the results of the analyzed eye tracking data will be linked to reaction time, accuracy and associated cognitive processes. It is intended to explore which information can be obtained by pupilometry and the systematic combination of the dimensions of eye movement data (location and time). As indicators for cognitive processes and cognitive workload, the resulting gaze patterns and the computed Index of Cognitive Activity, ICA [Marshall 2002] will be connected to the participant's performance in a test of the spatial ability factor visualization. The results will contribute to the question what eye behavioral measures are able to predict participants' abilities and provide insights into associated cognitive processes.
Humans use eye gaze in their daily interaction with other humans. Humanoid robots, on the other hand, have not yet taken full advantage of this form of implicit communication. We designed a passive monocular gaze tracking system implemented on the iCub humanoid robot [Metta et al. 2008]. The validation of the system proved that it is a viable low-cost, calibration-free gaze tracking solution for humanoid platforms, with a mean absolute error of about 5 degrees on horizontal angle estimates. We also demonstrated the applicability of our system to human-robot collaborative tasks, showing that the eye gaze reading ability can enable successful implicit communication between humans and the robot.
The comprehension of the correlation between hand and gaze is important in HCI as it provides foundations to build rich and natural user experiences on devices. Despite the interest in describing this correlation in Internet-related activities, notably search activities, the studies that have been done so far only focus on the mouse as the proxy for the hand. Thanks to the touchscreen, tablets allow us to investigate the hand-eye correlation with direct touch input instead. We designed a gaze and touch data collection via a user study involving Internet-based tasks on a tablet in order to explore the data, detail the correlation between touch input and gaze, and in a further step, use these results to suggest a gaze prediction model based on touch and context.
Existing gaze estimation methods rely mainly on 3D eye model or 2D eye appearance. While both methods have validated their effectiveness in various fields and applications, they are still limited in practice, such as portable and non-intrusive system and robust eye gaze tracking in different environments. To this end, we investigate on combining eye model with eye appearance to perform gaze estimation and eye gaze tracking. Specifically, unlike traditional 3D model based methods which rely on cornea reflections, we plan to retrieve 3D information from depth sensor (Eg, Kinect). Kinect integrates camera sensor and IR illuminations into one single device, thus enable more flexible system settings. We further propose to utilize appearance information to help the basic model based methods. Appearance information can help better detection of gaze related features (Eg, pupil center). Plus, eye model and eye appearance can benefit each other to enable robust and accurate gaze estimation.
Eye movements have provided an excellent substrate with which to explore the neural control of motor systems. The simplicity of the neural circuitry and physical plants, in comparison to visually directed limb movements, allow for much easier analysis and extrapolation. The adaptive capabilities of eye movements are robust and reflect the significant neural plasticity within these systems. Although crucial for optimal motor function, these adaptive properties and the neural mechanisms responsible are only beginning to be understood. While limb and saccadic adaptations have been intensively studied, the adaptive response is measured indirectly as a change in the original response. Vergence, however, appears to provide the opportunity to measure the adaptive response in isolation. The following are preliminary results of a study investigating the adaptive properties of vergence eye movements using a main sequence analysis. The effects of stimulus directionality and amplitude are investigated and compared to the reflexive vergence innervation patterns known to exist to similar stimuli.
To develop a real-time pupil and Purkinje tracking system with sub-millisecond latency for eye motion compensation in high resolution ophthalmoscopes through steering two orthogonal optical scanners optically conjugate to rotation center of the eye.
This paper modifies the DBSCAN algorithm to identify fixations and saccades. This method combines advantages from dispersion-based algorithms, such as resilience to noise and intuitive fixational structure, and from velocity-based algorithms, such as the ability to deal appropriately with smooth pursuit (SP) movements.
Researchers use fixation identification algorithms to parse eye movement trajectories into a series of fixations and saccades, simplifying analyses and providing measures which may relate to cognition. The Distance Dispersion (I-DD) a widely-used elementary fixation identification algorithm. Yet the "optimality" properties of its most popular greedy implementation have not been described. This paper: (1) asks how "optimal" should be defined, and advances maximizing total fixation time and minimizing number of clusters as a definition; (2) asks whether the greedy implementation of I-DD is optimal, and shows that it is when no fixations are rejected for being too short; and (3) we show that when fixation time rejection criterion are enabled, the greedy algorithm is not optimal. We propose an O(n2) algorithm which is.
Neurochemical systems are well studied in animal learning; however, ethical issues limit methodologies to explore these systems in humans. Pupillometry provides a glimpse into the brain's neurochemical systems, where pupil dynamics in monkeys have been linked with locus coeruleus (LC) activity, which releases norepinephrine (NE) throughout the brain. The objective of my research is to understand the role of neurochemicals in human learning. Specifically, I aim to 1) Establish a non-invasive method to study the role of neurochemicals in human learning, 2) Develop methods to monitor learning in real time using pupillometry, and 3) Discover causal relationships between neurochemicals and learning in human subjects. In this article, to address Objective 1, we present evidence that pupil dynamics can be used as a surrogate measure of neurochemical activity during learning. Specifically, we hypothesize that norepinephrine modulates the encoding of memories, the influence of which can be measured with pupil dynamics. To examine this hypothesis a task-irrelevant learning paradigm was used, in which learning is boosted for stimuli temporally paired with task targets. We show that participants better recognize images that are paired with task targets than distractors and, in correspondence, that pupil size changes more for target-paired than distractor-paired images. To further investigate the hypothesis that NE nonspecifically guides learning for stimuli that are present with its release, a second procedure was used that employed an unexpected sound to activate the LC--NE system and induce pupil-size changes; results indicated a corresponding increase in memorization of images paired with the unexpected sounds. Together, these results suggest a relationship between the LC--NE system, pupil-size changes, and learning. My ongoing work aims to develop methods to monitor learning in real time by investigating the relationship between, pupil size changes, eye movement and learning in context of a free visual search task. Future work will investigate the causal relationship between neurochemicals, learning and pupil dynamics by using NE specific drugs to up- and down-regulate levels of NE during learning.