This article reports on an investigation of the use of convolutional neural networks to predict the visual attention of chess players. The visual attention model described in this article has been created to generate saliency maps that capture hierarchical and spatial features of chessboard, in order to predict the probability fixation for individual pixels Using a skip-layer architecture of an autoencoder, with a unified decoder, we are able to use multiscale features to predict saliency of part of the board at different scales, showing multiple relations between pieces. We have used scan path and fixation data from players engaged in solving chess problems, to compute 6600 saliency maps associated to the corresponding chess piece configurations. This corpus is completed with synthetically generated data from actual games gathered from an online chess platform. Experiments realized using both scan-paths from chess players and the CAT2000 saliency dataset of natural images, highlights several results. Deep features, pretrained on natural images, were found to be helpful in training visual attention prediction for chess. The proposed neural network architecture is able to generate meaningful saliency maps on unseen chess configurations with good scores on standard metrics. This work provides a baseline for future work on visual attention prediction in similar contexts.
Human-robot collaboration systems benefit from recognizing people's intentions. This capability is especially useful for collaborative manipulation applications, in which users operate robot arms to manipulate objects. For collaborative manipulation, systems can determine users' intentions by tracking eye gaze and identifying gaze fixations on particular objects in the scene (i.e., semantic gaze labeling). Translating 2D fixation locations (from eye trackers) into 3D fixation locations (in the real world) is a technical challenge. One approach is to assign each fixation to the object closest to it. However, calibration drift, head motion, and the extra dimension required for real-world interactions make this position matching approach inaccurate. In this work, we introduce velocity features that compare the relative motion between subsequent gaze fixations and a finite set of known points and assign fixation position to one of those known points. We validate our approach on synthetic data to demonstrate that classifying using velocity features is more robust than a position matching approach. In addition, we show that a classifier using velocity features improves semantic labeling on a real-world dataset of human-robot assistive manipulation interactions.
We investigate the smooth pursuit eye movement based interaction using an unmodified off-the-shelf RGB camera. In each pair of sequential video frames, we compute the indicative direction of the eye movement by analyzing flow vectors obtained using the Lucas-Kanade optical flow algorithm. We discuss how carefully selected low vectors could replace the traditional pupil centers detection in smooth pursuit interaction. We examine implications of unused features in the eye camera imaging frame as potential elements for detecting gaze gestures. This simple approach is easy to implement and abstains from many of the complexities of pupil based approaches. In particular, EyeFlow does not call for either a 3D pupil model or 2D pupil detection to track the pupil center location. We compare this method to state-of-the-art approaches and ind that this can enable pursuit interactions with standard cameras. Results from the evaluation with 12 users data yield an accuracy that compares to previous studies. In addition, the benefit of this work is that the approach does not necessitate highly matured computer vision algorithms and expensive IR-pass cameras.
Analysis of eye-gaze is a critical tool for studying human-computer interaction and visualization. Yet eye tracking systems only report eye-gaze on the scene by producing large volumes of coordinate time series data. To be able to use this data, we must first extract salient events such as eye fixations, saccades, and post-saccadic oscillations (PSO). Manually extracting these events is time-consuming, labor-intensive and subject to variability. In this paper, we present and evaluate simple and fast automatic solutions for eye-gaze analysis based on supervised learning. Similar to some recent studies, we developed different simple neural networks demonstrating that feature learning produces superior results in identifying events from sequences of gaze coordinates. We do not apply any ad-hoc post-processing, thus creating a fully automated end-to-end algorithms that perform as good as current state-of-the-art architectures. Once trained they are fast enough to be run in a near real time setting.
The complex stochastic nature of eye tracking data calls for exploring sophisticated statistical models to ensure reliable inference in multi-trial eye-tracking experiments. We employ a Bayesian semi-parametric mixed-effects Markov model to compare gaze transition matrices between different experimental factors accommodating individual random effects. The model not only allows us to assess global influences of the external factors on the gaze transition dynamics but also provides comprehension of these effects at a deeper local level. We experimented to explore the impact of recognizing distorted images of artwork and landmarks on the gaze transition patterns. Our dataset comprises sequences representing areas of interest visited when applying a content independent grid to the resulting scan paths in a multi-trial setting. Results suggest that image recognition to some extent affects the dynamics of the transitions while image type played an essential role in the viewing behavior.
In this paper, we investigate the probability and timing of attaining gaze fixations on interacted objects during hand interaction in virtual reality, with the main purpose for implicit and continuous eye tracking re-calibration. We conducted an evaluation with 15 participants in which their gaze was recorded while interacting with virtual objects. The data was analysed to find factors influencing the probability of fixations at different phases of interaction for different object types. The results indicate that 1) interacting with stationary objects may be favourable in attaining fixations to moving objects, 2) prolonged and precision-demanding interactions positively influences the probability to attain fixations, 3) performing multiple interactions simultaneously can negatively impact the probability of fixations, and 4) feedback can initiate and end fixations on objects.
One of the obstacles to bring eye tracking technology to everyday human computer interactions is the time consuming calibration procedure. In this paper we investigate a novel calibration method based on smooth pursuit eye movement. The method uses linear regression to calculate the calibration mapping. The advantage is that users can perform the calibration quickly in a few seconds and only use a small calibration area to cover a large tracking area. We first describe the theoretical background on establishing a calibration mapping and discuss differences of calibration methods used. We then present a user study comparing the new regression-based method with a classical nine-point and with other pursuit-based calibrations. The results show the proposed method is fully functional, quick, and enables accurate tracking of a large area. The method has the potential to be integrated into current eye tracking systems to make them more usable in various use cases.
Remote eye trackers are widely used for screen-based interactions. They are less intrusive than head mounted eye trackers, but are generally quite sensitive to head movement. This leads to the requirement for frequent recalibration, especially in applications requiring accurate eye tracking. We propose here an online calibration method to compensate for head movements if estimates of the gaze targets are available. For example, in dwell-time based gaze typing it is reasonable to assume that for correct selections, the user's gaze target during the dwell-time was at the key center. We use this assumption to derive an eye-position dependent linear transformation matrix for correcting the measured gaze. Our experiments show that the proposed method significantly reduces errors over a large range of head movements.
Automatic saliency-based recalibration is promising for addressing calibration drift in mobile eye trackers but existing bottom-up saliency methods neglect user's goal-directed visual attention in natural behaviour. By inspecting real-life recordings of egocentric eye tracker cameras, we reveal that users are likely to look at their phones once these appear in view. We propose two novel automatic recalibration methods that exploit mobile phone usage: The first builds saliency maps using the phone location in the egocentric view to identify likely gaze locations. The second uses the occurrence of touch events to recalibrate the eye tracker, thereby enabling privacy-preserving recalibration. Through in-depth evaluations on a recent mobile eye tracking dataset (N=17, 65 hours) we show that our approaches outperform a state-of-the-art saliency approach for automatic recalibration. As such, our approach improves mobile eye tracking and gaze-based interaction, particularly for long-term use.
The duration of the so-called "Quiet Eye" (QE) - the final fixation before the initiation of a critical movement - seems to be linked to better perceptual-motor performances in various domains. For instance, experts show longer QE durations when compared to their less skilled counterparts. The aim of this paper was to replicate and extend previous work on the QE [Vickers and Williams 2007] in elite biathletes in an ecologically valid environment. Specifically, we tested whether longer QE durations result in higher shooting accuracy. To this end, we developed a gun-mounted eye tracker as a means to obtain reliable gaze data without interfering with the athletes' performance routines. During regular training protocols we collected gaze and performance data of 9 members (age 19.8 ± 0.45) of the German national junior team. The results did not show a significant effect of QE duration on shooting performance. Based on our findings, we critically discuss various conceptual as well as methodological issues with the QE literature that need to be aligned in future research to resolve current inconsistencies.
Visual analytics (VA) research provides helpful solutions for interactive visual data analysis when exploring large and complex datasets. Due to recent advances in eye tracking technology, promising opportunities arise to extend these traditional VA approaches. Therefore, we discuss foundations for eye tracking support in VA systems. We first review and discuss the structure and range of typical VA systems. Based on a widely used VA model, we present five comprehensive examples that cover a wide range of usage scenarios. Then, we demonstrate that the VA model can be used to systematically explore how concrete VA systems could be extended with eye tracking, to create supportive and adaptive analytics systems. This allows us to identify general research and application opportunities, and classify them into research themes. In a call for action, we map the road for future research to broaden the use of eye tracking and advance visual analytics.
We present a method for the spatio-temporal analysis of gaze data from multiple participants in the context of a video stimulus. For such data, an overview of the recorded patterns is important to identify common viewing behavior (such as attentional synchrony) and outliers. We adopt the approach of space-time cube visualization, which extends the spatial dimensions of the stimulus by time as the third dimension. Previous work mainly handled eye tracking data in the space-time cube as point cloud, providing no information about the stimulus context. This paper presents a novel visualization technique that combines gaze data, a dynamic stimulus, and optical flow with volume rendering to derive an overview of the data with contextual information. With specifically designed transfer functions, we emphasize different data aspects, making the visualization suitable for explorative analysis and for illustrative support of statistical findings alike.
Eye movements of developers are used to speculate the mental cognition model (i.e., bottom-up or top-down) applied during program comprehension tasks. The cognition models examine how programmers understand source code by describing the temporary information structures in the programmer's short term memory. The two types of models that we are interested in are top-down and bottom-up. The top-down model is normally applied as-needed (i.e., the domain of the system is familiar). The bottom-up model is typically applied when a developer is not familiar with the domain or the source code. An eye-tracking study of 18 developers reading and summarizing Java methods is used as our dataset for analyzing the mental cognition model. The developers provide a written summary for methods assigned to them. In total, 63 methods are used from five different systems. The results indicate that on average, experts and novices read the methods more closely (using the bottom-up mental model) than bouncing around (using top-down). However, on average novices spend longer gaze time performing bottom-up (66s.) compared to experts (43s.)
In this paper, we analyze eye movement data of 26 participants using a quantitative and qualitative approach to investigate how people read natural language text in comparison to source code. In particular, we use the radial transition graph visualization to explore strategies of participants during these reading tasks and extract common patterns amongst participants. We illustrate via examples how visualization can play a role at uncovering behavior of people while reading natural language text versus source code. Our results show that the linear reading order of natural text is only partially applicable to source code reading. We found patterns representing a linear order and also patterns that represent reading of the source code in execution order. Participants also focus more on those areas that are important to comprehend core functionality and we found that they skip unimportant constructs such as brackets.
This eye tracking study examines participants' visual attention when solving algorithmic problems in the form of programming problems. The stimuli consisted of a problem statement, example output, and a set of multiple-choice questions regarding variables, data types, and operations needed to solve the programming problems. We recorded eye movements of students and performed an Area of Interest (Aol) sequence analysis to identify reading strategies in terms of participants' performance and visual effort. Using classical eye tracking metrics and a visual Aol sequence analysis we identified two main groups of participants---effective and ineffective problem solvers. This indicates that diversity of participants' mental schemas leads to a difference in their performance. Therefore, identifying how participants' reading behavior varies at a finer level of granularity warrants further investigation.
Despite recent developments in eye tracking technology, mobile eye trackers (ET) are still expensive devices limited to a few hundred samples per second. High speed ETs (closer to 1 KHz) can provide improved flexibility for data filtering and more reliable event detection. To address these challenges, we present the Stroboscopic Catadioptric Eye Tracking (SCET) system, a novel approach for mobile ET based on rolling shutter cameras and stroboscopic structured infrared lighting. SCET proposes a geometric model where the cornea acts as a spherical mirror in a catadioptric system, changing the projection as it moves. Calibration methods for the geometry of the system and for the gaze estimation are presented. Instead of tracking common eye features, such as the pupil center, we track multiple glints on the cornea. By carefully adjusting the camera exposure and the lighting period, we show how one image frame can be divided into several bands to increase the temporal resolution of the gaze estimates. We assess the model in a simulated environment and also describe a prototype implementation that demonstrates the feasibility of SCET, which we envision as a step further in the direction of a mobile, robust, affordable, and high-speed eye tracker.
A key assumption conventionally made by flexible head-mounted eye-tracking systems is often invalid: The eye center does not remain stationary w.r.t. the eye camera due to slippage. For instance, eye-tracker slippage might happen due to head acceleration or explicit adjustments by the user. As a result, gaze estimation accuracy can be significantly reduced. In this work, we propose Grip, a novel gaze estimation method capable of instantaneously compensating for eye-tracker slippage without additional hardware requirements such as glints or stereo eye camera setups. Grip was evaluated using previously collected data from a large scale unconstrained pervasive eye-tracking study. Our results indicate significant slippage compensation potential, decreasing average participant median angular offset by more than 43% w.r.t. a non-slippage-robust gaze estimation method. A reference implementation of Grip was integrated into EyeRecToo, an open-source hardware-agnostic eye-tracking software, thus making it readily accessible for multiple eye trackers (Available at: www.ti.uni-tuebingen.de/perception).
The classification of eye movements is a very important part of eye tracking research and has been studied since its early days. Over recent years, we have experienced an increasing shift towards more immersive experimental scenarios with the use of eye-tracking enabled glasses and head-mounted displays. In these new scenarios, however, most of the existing eye movement classification algorithms cannot be applied robustly anymore because they were developed with monitor-based experiments using regular 2D images and videos in mind. In this paper, we describe two approaches that reduce artifacts of eye movement classification for 360° videos shown in head-mounted displays. For the first approach, we discuss how decision criteria have to change in the space of 360° videos, and use these criteria to modify five popular algorithms from the literature. The modified algorithms are publicly available at https://web.gin.g-node.org/ioannis.agtzidis/360_em_algorithms. For cases where an existing algorithm cannot be modified, e.g. because it is closed-source, we present a second approach that maps the data instead of the algorithm to the 360° space. An empirical evaluation of both approaches shows that they significantly reduce the artifacts of the initial algorithm, especially in the areas further from the horizontal midline.
Photosensor oculography (PSOG) is a promising solution for reducing the computational requirements of eye tracking sensors in wireless virtual and augmented reality platforms. This paper proposes a novel machine learning-based solution for addressing the known performance degradation of PSOG devices in the presence of sensor shifts. Namely, we introduce a convolutional neural network model capable of providing shift-robust end-to-end gaze estimates from the PSOG array output. Moreover, we propose a transfer-learning strategy for reducing model training time. Using a simulated workflow with improved realism, we show that the proposed convolutional model offers improved accuracy over a previously considered multilayer perceptron approach. In addition, we demonstrate that the transfer of initialization weights from pre-trained models can substantially reduce training time for new users. In the end, we provide the discussion regarding the design trade-offs between accuracy, training time, and power consumption among the considered models.
Gaze depth estimation presents a challenge for eye tracking in 3D. This work investigates a novel approach to the problem based on eye movement mediated by the vestibulo-ocular reflex (VOR). VOR stabilises gaze on a target during head movement, with eye movement in the opposite direction, and the VOR gain increases the closer the fixated target is to the viewer. We present a theoretical analysis of the relationship between VOR gain and depth which we investigate with empirical data collected in a user study (N=10). We show that VOR gain can be captured using pupil centres, and propose and evaluate a practical method for gaze depth estimation based on a generic function of VOR gain and two-point depth calibration. The results show that VOR gain is comparable with vergence in capturing depth while only requiring one eye, and provide insight into open challenges in harnessing VOR gain as a robust measure.
Joint attention is an essential part of the development process of children, and impairments in joint attention are considered as one of the first symptoms of autism. In this paper, we develop a novel technique to characterize joint attention in real time, by studying the interaction of two human subjects with each other and with multiple objects present in the room. This is done by capturing the subjects' gaze through eye-tracking glasses and detecting their looks on predefined indicator objects. A deep learning network is trained and deployed to detect the objects in the field of vision of the subject by processing the video feed of the world view camera mounted on the eye-tracking glasses. The looking patterns of the subjects are determined and a real-time audio response is provided when a joint attention is detected, i.e., when their looks coincide. Our findings suggest a trade-off between the accuracy measure (Look Positive Predictive Value) and the latency of joint look detection for various system parameters. For more accurate joint look detection, the system has higher latency, and for faster detection, the detection accuracy goes down.
Eye movement classification algorithms are typically evaluated either in isolation (in terms of absolute values of some performance statistic), or in comparison to previously introduced approaches. In contrast to this, we first introduce and thoroughly evaluate a set of both random and above-chance baselines that are completely independent of the eye tracking signal recorded for each considered individual observer. Surprisingly, our baselines often show performance that is either comparable to, or even exceeds the scores of some established eye movement classification approaches, for smooth pursuit detection in particular. In these cases, it may be that (i) algorithm performance is poor, (ii) the data set is overly simplistic with little inter-subject variability of the eye movements, or, alternatively, (iii) the currently used evaluation metrics are inappropriate. Based on these observations, we discuss the level of stimulus dependency of the eye movements in four different data sets. Finally, we propose a novel measure of agreement between true and assigned eye movement events, which, unlike existing metrics, is able to reveal the expected performance gap between the baselines and dedicated algorithms.
By temporally integrating information about pupil contours extracted from eye images, model-based methods for glint-free gaze estimation can mitigate pupil detection noise. However, current approaches require time-consuming iterative solving of a nonlinear minimization problem to estimate key parameters, such as eyeball position. Based on the method presented by [Swirski and Dodgson 2013], we propose a novel approach to glint-free 3D eye-model fitting and gaze prediction using a single near-eye camera. By recasting model optimization as a least-squares intersection of lines, we make it amenable to a fast non-iterative solution. We further present a method for estimating deterministic refraction-correction functions from synthetic eye images and validate them on both synthetic and real eye images. We demonstrate the robustness of our method in the presence of pupil detection noise and show the benefit of temporal integration of pupil contour information on eyeball position and gaze estimation accuracy.
Eye tracking, which measures line of sight, is expected to advance as an intuitive and rapid input method for user interfaces, and a cross-ratio based method that calculates the point-of-gaze using homography matrices has attracted attention because it does not require hardware calibration to determine the geometric relationship between an eye camera and a screen. However, this method requires near-infrared (NIR) light-emitting diodes (LEDs) attached to the display in order to detect screen corners. Consequently, LEDs must be installed around the display to estimate the point-of-gaze. Without these requirements, cross-ratio based gaze estimation can be distributed smoothly. Therefore, we propose the use of a polarization camera for detecting the screen area reflected on a corneal surface. The reflection area of display light is easily detected by the polarized image because the light radiated from the display is polarized linearly by the internal polarization filter. With the proposed method, the screen corners can be determined without using NIR LEDs, and the point-of-gaze can be estimated using the detected corners on the corneal surface. We investigated the accuracy of the estimated point-of-gaze based on a cross-ratio method under various illumination and display conditions. Cross-ratio based gaze estimation is expected to be utilized widely in commercial products because the proposed method does not require infrared light sources at display corners.
We evaluate subtle, emotionally-driven models of eye movement animation. Two models are tested, reading and face scanning, each based on recorded gaze transition probabilities. For reading, simulated emotional mood is governed by the probability density function that varies word advancement, i.e., re-fixations, forward, or backward skips. For face scanning, gaze behavior depends on task (gender or emotion discrimination) or the facial emotion portrayed. Probability density functions in both cases are derived from empirically observed transitions that significantly alter viewing behavior, captured either during mood-induced reading or during scanning faces expressing different emotions. A perceptual study shows that viewers can distinguish between reading and face scanning eye movements. However, viewers could not gauge the emotional valence of animated eye motion. For animation, our contribution shows that simulated emotionally-driven viewing behavior is too subtle to be discerned, or it needs to be exaggerated to be effective.
Eyewear devices, such as augmented reality displays, increasingly integrate eye tracking, but the first-person camera required to map a user's gaze to the visual scene can pose a significant threat to user and bystander privacy. We present PrivacEye, a method to detect privacy-sensitive everyday situations and automatically enable and disable the eye tracker's first-person camera using a mechanical shutter. To close the shutter in privacy-sensitive situations, the method uses a deep representation of the first-person video combined with rich features that encode users' eye movements. To open the shutter without visual input, PrivacEye detects changes in users' eye movements alone to gauge changes in the "privacy level" of the current situation. We evaluate our method on a first-person video dataset recorded in daily life situations of 17 participants, annotated by themselves for privacy sensitivity, and show that our method is effective in preserving privacy in this challenging setting.
With eye tracking being increasingly integrated into virtual and augmented reality (VR/AR) head-mounted displays, preserving users' privacy is an ever more important, yet under-explored, topic in the eye tracking community. We report a large-scale online survey (N=124) on privacy aspects of eye tracking that provides the first comprehensive account of with whom, for which services, and to what extent users are willing to share their gaze data. Using these insights, we design a privacy-aware VR interface that uses differential privacy, which we evaluate on a new 20-participant dataset for two privacy sensitive tasks: We show that our method can prevent user re-identification and protect gender information while maintaining high performance for gaze-based document type classification. Our results highlight the privacy challenges particular to gaze data and demonstrate that differential privacy is a potential means to address them. Thus, this paper lays important foundations for future research on privacy-aware gaze interfaces.
As large eye-tracking datasets are created, data privacy is a pressing concern for the eye-tracking community. De-identifying data does not guarantee privacy because multiple datasets can be linked for inferences. A common belief is that aggregating individuals' data into composite representations such as heatmaps protects the individual. However, we analytically examine the privacy of (noise-free) heatmaps and show that they do not guarantee privacy. We further propose two noise mechanisms that guarantee privacy and analyze their privacy-utility tradeoff. Analysis reveals that our Gaussian noise mechanism is an elegant solution to preserve privacy for heatmaps. Our results have implications for interdisciplinary research to create differentially private mechanisms for eye tracking.
Eye-gaze and mid-air gestures are promising for resisting various types of side-channel attacks during authentication. However, to date, a comparison of the different authentication modalities is missing. We investigate multiple authentication mechanisms that leverage gestures, eye gaze, and a multimodal combination of them and study their resilience to shoulder surfing. To this end, we report on our implementation of three schemes and results from usability and security evaluations where we also experimented with fixed and randomized layouts. We found that the gaze-based approach outperforms the other schemes in terms of input time, error rate, perceived workload, and resistance to observation attacks, and that randomizing the layout does not improve observation resistance enough to warrant the reduced usability. Our work further underlines the significance of replicating previous eye tracking studies using today's sensors as we show significant improvement over similar previously introduced gaze-based authentication systems.
Laparoscopic surgery has revolutionised state of the art in surgical health care. However, its complexity puts a significant burden on the surgeon's cognitive resources resulting in major biliary injuries. With the increasing number of laparoscopic surgeries, it is crucial to identify surgeons' cognitive loads (CL) and levels of focus in real time to give them unobtrusive feedback when detecting the suboptimal level of attention. Assuming that the experts appear to be more focused on attention, we investigate how the skill level of surgeons during live surgery is reflected through eye metrics. Forty-two laparoscopic surgeries have been conducted with four surgeons who have different expertise levels. Concerning eye metrics, we have used six metrics which belong to fixation and pupillary based metrics. With the use of mean, standard deviation and ANOVA test we have proven three reliable metrics which we can use to differentiate the skill level during live surgeries. In future studies, these three metrics will be used to classify the surgeons' cognitive load and level of focus during the live surgery using machine learning techniques.
Recognizing eye movements is important for gaze behavior understanding like in human communication analysis (human-human or robot interactions) or for diagnosis (medical, reading impairments). In this paper, we address this task using remote RGB-D sensors to analyze people behaving in natural conditions. This is very challenging given that such sensors have a normal sampling rate of 30 Hz and provide low-resolution eye images (typically 36×60 pixels), and natural scenarios introduce many variabilities in illumination, shadows, head pose, and dynamics. Hence gaze signals one can extract in these conditions have lower precision compared to dedicated IR eye trackers, rendering previous methods less appropriate for the task. To tackle these challenges, we propose a deep learning method that directly processes the eye image video streams to classify them into fixation, saccade, and blink classes, and allows to distinguish irrelevant noise (illumination, low-resolution artifact, inaccurate eye alignment, difficult eye shapes) from true eye motion signals. Experiments on natural 4-party interactions demonstrate the benefit of our approach compared to previous methods, including deep learning models applied to gaze outputs.
Head-mounted displays offer full control over lighting conditions. When equipped with eye tracking technology, they are well suited for experiments investigating pupil dilation in response to cognitive tasks, emotional stimuli, and motor task complexity, particularly for studies that would otherwise have required the use of a chinrest, since the eye cameras are fixed with respect to the head. This paper analyses pupil dilations for 13 out of 27 participants completing a Fitts' law task using a virtual reality headset with built-in eye tracking. The largest pupil dilation occurred for the condition subjectively rated as requiring the most physical and mental effort. Fitts' index of difficulty had no significant effect on pupil dilation, suggesting differences in motor task complexity may not affect pupil dilation.
End-to-end behavioral cloning trained by human demonstration is now a popular approach for vision-based autonomous driving. A deep neural network maps drive-view images directly to steering commands. However, the images contain much task-irrelevant data. Humans attend to behaviorally relevant information using saccades that direct gaze towards important areas. We demonstrate that behavioral cloning also benefits from active control of gaze. We trained a conditional generative adversarial network (GAN) that accurately predicts human gaze maps while driving in both familiar and unseen environments. We incorporated the predicted gaze maps into end-to-end networks for two behaviors: following and overtaking. Incorporating gaze information significantly improves generalization to unseen environments. We hypothesize that incorporating gaze information enables the network to focus on task critical objects, which vary little between environments, and ignore irrelevant elements in the background, which vary greatly.
Gradient based dark pupil tracking [Timm and Barth 2011] is a simple and robust algorithm for pupil center estimation. The algorithm's time complexity of O(n4) can be tackled by applying a two-stage process (coarse center estimation followed by a windowed refinement), as well as by optimizing and parallelizing code using cache-friendly data structures, vector-extensions of modern CPU's and GPU acceleration. We could achieve a substantial speed up compared to a non-optimized implementation: 12x using vector extensions and 65x using a GPU. Further, the two-stage process combined with parameter optimization using differential evolution considerably increased the accuracy of the algorithm. We evaluated our implementation using the "Labelled pupils the wild" data set. The percentage of frames with a pixel error below 15px increased from 28% to 72%, surpassing algorithmically more complex algorithms like ExCuse (64%) and catching up with recent algorithms like PuRe (87%).
In this paper, we propose a calibration-free gaze-based text entry system that uses smooth pursuit eye movements. We report on our implementation, which improves over prior work on smooth pursuit text entry by 1) eliminating the need of calibration using motion correlation, 2) increasing input rate from 3.34 to 3.41 words per minute, 3) featuring text suggestions that were trained on 10,000 lexicon sentences recommended in the literature. We report on a user study (N=26) which shows that users are able to eye type at 3.41 words per minutes without calibration and without user training. Qualitative feedback also indicates that users positively perceive the system. Our work is of particular benefit for disabled users and for situations when voice and tactile input are not feasible (e.g., in noisy environments or when the hands are occupied).
Cognitive biases, such as the bandwagon effect, occur when a participant places a disproportionate emphasis on external information when making decisions under uncertainty. These effects are challenging for humans to overcome - even when they are explicitly made aware of their own biases. One challenge for researchers is to detect if the information is used in decision making and to what degree. One can gain a better understanding of how this external information is used in decision making using an eye tracker. In this paper, we evaluate cognitive biases in the context of assessing the binary relevance of a set of documents in response to a given information need. We show that these cognitive biases can be observed by examining gaze time in Areas of Interest (AOI) that contain this pertinent external information.
Mixed reality headsets are being designed with integrated eye trackers: cameras that image the user's eye to infer gaze location and pupil diameter. While the intent is to improve the quality of experience, built-in eye trackers create a security vulnerability for hackers - high resolution images of the user's iris. Anyone stealing an iris image has effectively captured a gold standard biometric, relied on for secure authentication in applications such as banking and voting. We present a low cost solution to degrade iris authentication while still permitting the utility of gaze tracking with acceptable accuracy. By demonstrating this solution on a commodity eye tracker, this paper urges the community to think about iris based authentication as a byproduct of eye tracking, and create solutions that empower a user to control this biometric.
The paper partially replicates and extends a previous study by Busjahn et al.  on the factors influencing dwell time during source code reading, where source code element type and frequency of gaze visits are studied as factors. Unlike the previous study, this study focuses on analyzing eye movement data in large open source Java projects. Five experts and thirteen novices participated in the study where the main task is to summarize methods. The results examine semantic line-level information that developers view during summarization. We find no correlation between the line length and the total duration of time spent looking on the line even though it exists between a token's length and the total fixation time on the token reported in prior work. The first fixations inside a method are more likely to be on a method's signature, a variable declaration, or an assignment compared to the other fixations inside a method. In addition, it is found that smaller methods tend to have shorter overall fixation duration for the entire method, but have significantly longer duration per line in the method. The analysis provides insights into how source code's unique characteristics can help in building more robust methods for analyzing eye movements in source code and overall in building theories to support program comprehension on realistic tasks.
Scanpath classification can offer insight into the visual strategies of groups such as experts and novices. We propose to use random ferns in combination with saccade angle successions to compare scanpaths. One advantage of our method is that it does not require areas of interest to be computed or annotated. The conditional distribution in random ferns additionally allows for learning angle successions, which do not have to be entirely present in a scanpath. We evaluated our approach on two publicly available datasets and improved the classification accuracy by ≈ 10 and ≈ 20 percent.
Deep learning is a promising technique for real-world pupil detection. However, the small amount of available accurately-annotated data poses a challenge when training such networks. Here, we utilize non-challenging eye videos where algorithmic approaches perform virtually without errors to automatically generate a foundational data set containing subpixel pupil annotations. Then, we propose multiple domain-specific data augmentation methods to create unique training sets containing controlled distributions of pupil-detection challenges. The feasibility, convenience, and advantage of this approach is demonstrated by training a CNN with these datasets. The resulting network outperformed current methods in multiple publicly-available, realistic, and challenging datasets, despite being trained solely with the augmented eye images. This network also exhibited better generalization w.r.t. the latest state-of-the-art CNN: Whereas on datasets similar to training data, the nets displayed similar performance, on datasets unseen to both networks, ours outperformed the state-of-the-art by ≈27% in terms of detection rate.
Although smartphones are widely used in everyday life, studies of viewing behavior mainly employ desktop computers. This study examines whether closely spaced target locations on a smartphone can be decoded from gaze. Subjects wore a head-mounted eye tracker and fixated a target that successively appeared at 30 positions spaced by 10.0 × 9.0 mm. A "hand-held" (phone in subject's hand) and a "mounted" (phone on surface) condition were conducted. Linear-mixed-models were fitted to examine whether gaze differed between targets. T-tests on root-mean-squared errors were calculated to evaluate the deviation between gaze and targets. To decode target positions from gaze data we trained a classifier and assessed its performance for every subject/condition. While gaze positions differed between targets (main effect "target"), gaze deviated from the real positions. The classifier's performance for the 30 locations ranged considerably between subjects ("mounted": 30 to 93 % accuracy; "hand-held": 8 to 100 % accuracy).
Understanding differences and similarities between scanpaths has been one of the primary goals for eye tracking research. Sequences of areas of interest mapped from fixations are a major focus for many analytic techniques since these sequences directly relate to the semantic meaning of the visual input. Many studies analyze complete sequences while overlooking the micro-transitions in subsequences. In this paper, we propose a method which extracts subsequences as features and finds contrasting patterns between different viewer groups. The contrast patterns help domain experts to quantify variations between visual activities and understand reasoning processes for complex visual tasks. Experiments were conducted with 39 expert and novice radiographers using nine radiology images corresponding to nine levels of task complexity. Identified contrast patterns, validated by an expert, prove that the method effectively reveals visual reasoning processes that are otherwise hidden.
Observable reading behavior, the act of moving the eyes over lines of text, is highly stereotyped among the users of a language, and this has led to the development of reading detectors-methods that input windows of sequential fixations and output predictions of the fixation behavior during those windows being reading or skimming. The present study introduces a new method for reading detection using Region Ranking SVM (RRSVM). An SVM-based classifier learns the local oculomotor features that are important for real-time reading detection while it is optimizing for the global reading/skimming classification, making it unnecessary to hand-label local fixation windows for model training. This RRSVM reading detector was trained and evaluated using eye movement data collected in a laboratory context, where participants viewed modified web news articles and had to either read them carefully for comprehension or skim them quickly for the selection of keywords (separate groups). Ground truth labels were known at the global level (the instructed reading or skimming task), and obtained at the local level in a separate rating task. The RRSVM reading detector accurately predicted 82.5% of the global (article-level) reading/skimming behavior, with accuracy in predicting local window labels ranging from 72-95%, depending on how tuned the RRSVM was for local and global weights. With this RRSVM reading detector, a method now exists for near real-time reading detection without the need for hand-labeling of local fixation windows. With real-time reading detection capability comes the potential for applications ranging from education and training to intelligent interfaces that learn what a user is likely to know based on previous detection of their reading behavior.
In corneal imaging methods, it is essential to use a 3D eyeball model for generating an undistorted image. Thus, the relationship between the eye and eye camera is fixed by using a head-mounted device. Remote corneal imaging has several potential applications such as surveillance systems and driver monitoring. Therefore, we integrated a 3D eyeball model with a 3D face model to facilitate remote corneal imaging. We conducted evaluation experiments and confirmed the feasibility of remote corneal imaging. We showed that the center of the eyeball can be estimated based on face tracking, and thus, corneal imaging can function as continuous remote eye tracking.
Availability of large scale tagged datasets is a must in the field of deep learning applied to the eye tracking challenge. In this paper, the potential of Supervised-Descent-Method (SDM) as a semiautomatic labelling tool for eye tracking images is shown. The objective of the paper is to evidence how the human effort needed for manually labelling large eye tracking datasets can be radically reduced by the use of cascaded regressors. Different applications are provided in the fields of high and low resolution systems. An iris/pupil center labelling is shown as example for low resolution images while a pupil contour points detection is demonstrated in high resolution. In both cases manual annotation requirements are drastically reduced.
In this paper we present the TobiiGlassesPySuite, an open-source suite we implemented for using the Tobii Pro Glasses 2 wearable eye-tracker in custom eye-tracking studies. We provide a platform-independent solution for controlling the device and for managing the recordings. The software consists of Python modules, integrated into a single package, accompanied by sample scripts and recordings. The proposed solution aims at providing additional methods with respect to the manufacturer's software, for allowing the users to exploit more the device's capabilities and the existing software. Our suite is available for download from the repository indicated in the paper and usable according to the terms of the GNU GPL v3.0 license.
Both musicians and programmers have expectations when they read music scores or source code. The goal of these studies is to get an insight into what will happen when these expectations are violated in familiar tasks. In music reading study, we explored eye movements of musically experienced participants singing and playing on a piano familiar melodies either containing or not containing a bar shifted down a tone in two different keys. First-pass fixation durations, the mean pupil size during first-pass fixations and eye-time span parameters were analysed using linear mixed models. All three parameters can provide useful information on the processing of incongruence in music. Furthermore, the pupil size parameter might be sensitive to the modality of performance. In the code reading study, we plan to explore incongruence in familiar code tasks and its reflection in eye movements of programmers.
Fatigue detection, monitoring and management is important and needs to be accommodated in the busy lifestyles that many people have these days. It may have an impact on the physical as well as the emotional health of the individuals. Detection of fatigue is the first step towards its management. With eye-tracking software using cameras, and being included in the laptops and smartphones, it now has the potential to become quite ubiquitous.
This extended abstract describes my PhD project for fatigue detection using eye-tracking measures while gaze typing. The steps taken and experiments conducted upto now are presented, with an outline of the future plans. The principal use-case will be to provide the service of fatigue detection for people with neurological disorders, who use eye-tracking for alternative communications.
Eye tracking is rapidly becoming popular in consumer technology, including virtual and augmented reality. Eye trackers commonly provide an estimate of gaze location, and pupil diameter. Pupil diameter is useful for interactive systems, as it provides means to estimate cognitive load, stress, and emotional state. However, there are several roadblocks that limit the use of pupil diameter. In VR HMDs there are a lack of models that account for stereoscopic viewing and the increased brightness of near eye displays. Existing work has shown correlations between pupil diameter and emotion, but have not been extended to VR environments. The scope of this work is to bridge the gap between existing research on emotion and pupil diameter to VR, while also attempting to use pupillary data to tackle the problem of simulator sickness in VR.
Gaze may be a good alternative input modality for people with limited hand mobility. This accessible control based on eye tracking can be implemented into telepresence robots, which are widely used to promote remote social interaction and providing the feeling of presence. This extended abstract introduces a Ph.D. research project, which takes a two-phase approach towards investigating gaze-controlled telepresence robots. A system supporting gaze-controlled telepresence has been implemented. However, our current findings indicate that there were still serious challenges with regard to gaze-based driving. Potential improvements are discussed, and plans for future study are also presented.
This PhD thesis aims to contribute to our knowledge about how we experience paintings and more specifically, about how visual exploration, cognitive categorization and emotive evaluation contribute to the aesthetic dimension. [Schaeffer 2015; Leder et al. 2004] of our experience of paintings. [Molnar 1981; Gombrich 1960; Bandaxall 1986; Bandaxall 1982; Bandaxall 1984] To this purpose, we use eye-tracking technology at Musée Unterlinden to record the vision of 52 participants looking at the Isenheim altarpiece before and after restoration. The first results before restoration allowed us to identify and classify the zones of visual salience as well as the effects of participants' backgrounds and emotions on fixation time and visual attention to different areas of interest. This analysis will be further compared with data collected in a similar study after restoration.
We investigate the mechanisms of attentional orienting in a 360-degree virtual environments. Through the use of Posner's paradigm, we study the effects of different attentional guidance techniques designed to improve information processing. The most efficient technique will be applied to a procedure learning tool in virtual reality and a remote air traffic control tower. The eye-tracker allows us to explore the differential effects of overt and covert orienting, to estimate the effectiveness of visual research and to use it as a technique for interaction in virtual reality.
Current video-based eye trackers fail to acquire a high signal-to-noise (SNR) ratio which is crucial for specific applications like interactive systems, event detection, the study of various eye movements, and most importantly estimating the gaze position with high certainty. Specifically, current video-based eye trackers over-rely on precise localization of the pupil boundary and/or corneal reflection (CR) for gaze tracking, which often results in inaccuracies and large sample-to-sample root mean square (RMS-S2S). Therefore, it is crucial to address the shortcomings of these trackers, and we plan to study a new video-based eye tracking methodology focused on simultaneously tracking the motion of many iris features and investigate its implications for obtaining high accuracy and precision. In our preliminary work, the method has shown great potential for robust detection of microsaccades over 0.2 degrees with high confidence. Hence, we plan to explore and optimize this technique.
Benign Paroxysmal Positional Vertigo (BPPV) is the most common cause of vertigo. It can be diagnosed and treated with simple maneuvers done by vestibular experts. However, there is a high rate of misdiagnosis that results in high medical costs from unnecessary neuroimaging tests. Here we show how to improve saccade detection methods for automatic detection of quick-phases of nystagmus, a key sign of BPPV. We test our method using eye movement data recorded in patients during the diagnostic maneuver.
Driven by significant investments from the gaming, film, advertising, and customer service industries among others, efforts across many different fields are converging to create realistic representations of humans that look like (computer graphics), sound like (natural language generation), move like (motion capture), and reason like (artificial intelligence) real humans. The ultimate goal of this work is to push the boundaries even further by exploring the development of realistic self-organized virtual humans that are capable of demonstrating coordinated behaviors across different modalities. Eye movements, for example, may be accompanied by changes in facial expression, head orientation, posture, gait properties, or speech. Traditionally however, these modalities are captured and modeled separately and this disconnect contributes to the well-known uncanny valley phenomenon. We focus initially on facial modalities, in particular, coordinated eye and head movements (and eventually facial expressions), but our proposed data-driven framework will be able to accommodate other modalities as well. transfer [Laine et al. 2017]. Despite these advances, the resulting renderings or animations are often still distinguishable from a real human, sometimes in unsettling ways - the so called uncanny valley phenomenon [Mori et al. 2012]. We argue that the traditional approach of capturing and modeling various human modalities separately contributes this effect. In this work, we focus on capturing, transferring, and generating realistic coordinated facial modalities (eye movements, head movements, and eventually facial expressions). We envision a flexible framework that can be extended to accommodate other modalities as well.
The research goal is to explore the relationship between eye tracking measures and a tactile version of the n-back task. The n-back task is often used to evoke cognitive load, however this is the first study that incorporates tactile stimulus as input. The study follows a within-subject design with easy and difficult experimental conditions. In the tactile n-back task, each participant will be asked to identify the number of pins felt under the fingertips. In the easy condition, each participant will then be asked to respond if a number shown on the computer screen is congruent with the number of recognized pins. In the difficult condition, each participant will be asked to refer to the pin number in the current trial and the previous trial. Microsaccades and pupil dilation will be recorded during the top-down process of performing the n-back task.
The research proposes four hypotheses that focus on deriving helpful insights from eye patterns, including hidden truths concerning programmer expertise, task context and difficulty. We present results from a study performed in a classroom setting with 17 students, in which we found that novice programmers visit output statements and declarations the same amount as the rest of the program they are presented other than control flow block headers.
This research builds upon insightful findings from our previous work, wherein we focus on gathering statistical eye-gaze effects between categories of various populations to drive the pursuit of new research. Ongoing and future work entails using the iTrace infrastructure to capture gaze as participants scroll to read code pages extending longer than what can fit on one screen. The focus will be on building various models that relate eye gaze to comprehension via methods that realistically capture activity in a development environment.
Current eye-tracking techniques rely primarily on video-based tracking of components of the anterior surfaces of the eye. However, these trackers have several limitations. Their limited resolution precludes study of small fixational eye motion. Furthermore, many of these trackers rely on calibration procedures that do not offer a way to validate their eye motion traces. By comparison, retinal-image-based trackers can track the motion of the retinal image directly, at frequencies greater than 1kHz and with subarcminute accuracy. The retinal image provides a way to validate the eye position at any point in time, offering an unambiguous record of eye motion as a reference for the eye trace. The benefits of using scanning retinal imaging systems as eye trackers, however, comes at the price of different problems that are not present in video-based systems, and need to be solved to obtain robust eye traces. The current abstract provides an overview of retinal-image-based eye tracking methods, provides preliminary eye-tracking results from a tracking scanning-laser ophthalmoscope (TSLO), and proposes a new binocular line-scanning eye-tracking system.
Research during the last decades has demonstrated that eye tracking methodology is an advantageous tool to study reading. A substantial amount of eye movement research has resulted in improved understanding of the reading process in skilled adult readers. Considerably fewer eye tracking studies have examined reading and its development in children. In this doctoral project, eye movements during reading and reading skill are investigated in a population based sample of Swedish elementary school children. The aims are to provide evidence from a large scale study and to explore the concurrent development of reading eye movements and reading skill. In the first study, we describe the eye movement variables across the grades and their connection to assessment on phonemic awareness, decoding strategies and processing speed. During the remainder of the current project, we will focus on longitudinal aspects, as the participants were recorded twice with a one-year-interval. Further, we will examine possible predictors of later reading skill among the eye movement variables and reading assessment outcomes.
We present an experimenter platform for designing and evaluating user-adaptive support in information visualizations. Specifically, this platform leverages eye-tracking data in real time to deliver adaptive support in visualizations based on the user's attentional patterns and individual needs. We describe the main functionalities of this platform, and show an application to support processing of textual documents with embedded bar charts, by dynamically providing highlighting in the charts to guide a user's attention to the relevant information.
Human attention processes play a major role in the optimization of human-robot collaboration (HRC) [Huang et al. 2015]. We describe a novel methodology to measure and predict situation awareness from eye and head gaze features in real-time. The awareness about scene objects of interest was described by 3D gaze analysis using data from eye tracking glasses and a precise optical tracking system. A probabilistic framework of uncertainty considers coping with measurement errors in eye and position estimation. Comprehensive experiments on HRC were conducted with typical tasks including handover in a lab based prototypical manufacturing environment. The gaze features highly correlate with scores of standardized questionnaires of situation awareness (SART [Taylor 1990], SAGAT [Endsley 2000]) and predict performance in the HRC task. This will open new opportunities for human factors based optimization in HRC applications.
The ability to monitor eye closures and blink patterns has long been known to enable accurate assessment of fatigue and drowsiness in individuals. Many measures of the eye are known to be correlated with fatigue including coarse-grained measures like the rate of blinks as well as fine-grained measures like the duration of blinks and the extent of eye closures. Despite a plethora of research validating these measures, we lack wearable devices that can continually and reliably monitor them in the natural environment. In this work, we present a low-power system, iLid, that can continually sense fine-grained measures such as blink duration and Percentage of Eye Closures (PERCLOS) at high frame rates of 100fps. We present a complete solution including design of the sensing, signal processing, and machine learning pipeline and implementation on a prototype computational eyeglass platform.
The ability to unobtrusively and continuously monitor one's facial expressions has implications for a variety of application domains ranging from affective computing to health-care and the entertainment industry The standard Facial Action Coding System (FACS) along with camera based methods have been shown to provide objective indicators of facial expressions; however, these approaches can also be fairly limited for mobile applications due to privacy concerns and awkward positioning of the camera. To bridge this gap, W!NCE re-purposes a commercially available Electrooculography-based eyeglass (J!NS MEME) for continuously and unobtrusively sensing of upper facial action units with high fidelity. W!NCE detects facial gestures using a two-stage processing pipeline involving motion artifact removal and facial action detection. We validate our system's applicability through extensive evaluation on data from 17 users under stationary and ambulatory settings.
To this day, a variety of information has been obtained from human eye movements, which holds an imense potential to understand and classify cognitive processes and states - e.g., through scanpath classification. In this work, we explore the task of scanpath classification through a combination of unsupervised feature learning and convolutional neural networks. As an amusement factor, we use an Emoji space representation as feature space. This representation is achieved by training generative adversarial networks (GANs) for unpaired scanpath-to-Emoji translation with a cyclic loss. The resulting Emojis are then used to train a convolutional neural network for stimulus prediciton, showing an accuracy improvement of more than five percentual points compared to the same network trained using solely the scanpath data. As a side effect, we also obtain novel unique Emojis representing each unique scanpath. Our goal is to demonstrate the applicability and potential of unsupervised feature learning to scanpath classification in a humorous and entertaining way.
Yarbus' claim to decode the observer's task from eye movements has received mixed reactions. In this paper, we have supported the hypothesis that it is possible to decode the task. We conducted an exploratory analysis on the dataset by projecting features and data points into a scatter plot to visualize the nuance properties for each task. Following this analysis, we eliminated highly correlated features before training an SVM and Ada Boosting classifier to predict the tasks from this filtered eye movements data. We achieve an accuracy of 95.4% on this task classification problem and hence, support the hypothesis that task classification is possible from a user's eye movement data.
Human eye movements are far from being well described with current indicators. From the dataset provided by the ETRA 2019 challenge, we analyzed saccades and fixations during a free exploration of blank or natural scenes and during visual search. Based on the two modes of exploration, ambient and focal, we used the K coefficient [Krejtz et al. 2016]. We failed to find any differences between tasks but this indicator gives only the dominant mode over the entire recording. The stability of both modes, assesses with the switch frequency and the mode duration allowed to differentiate gaze behavior according to situations. Time course analyses of K coefficient and switch frequency corroborate that the latter is a useful indicator, describing a greater portion of the eye movement recording.
Existing literature reveals little information about the relationship between microsaccade rate and the average change in pupil size. There is a need to investigate this relationship and how the microsaccade rate may be relevant to cognitive load. In our study, we compared the microsaccade rate to the average change in pupil size during eight experimental conditions. Four of them were considered fixation conditions (subjects look at a fixation cross in each visual scene) and four were free-viewing conditions (subjects are free to move their eyes over the visual scene). We analyzed the change in pupil size and microsaccade rate for the first part of each task and as well as the entire task in all conditions. We discovered a significant correlation between the microsaccade rate and the average change in pupil size during the first part of each task, and comparable characteristics throughout the entire task. Then we measured the data for only one of the experimental conditions in free-viewing that involves a search task to understand comparable characteristics related to cognitive load. We found that there is a correlation between the microsaccade and pupil data. We hope that this finding will help further the understanding of the relative function of microsaccades and use it to support cognitive load response and pupil measurement.
Aiming is key for virtual reality (VR) interaction, and it is often done using VR controllers. Recent eye-tracking integrations in commercial VR head-mounted displays (HMDs) call for further research on usability and performance aspects to better determine possibilities and limitations. This paper presents a user study exploring gaze aiming in VR compared to a traditional controller in an "aim and shoot" task. Different speeds of targets and trajectories were studied. Qualitative data was gathered using the system usability scale (SUS) and cognitive load (NASA TLX) questionnaires. Results show a lower perceived cognitive load using gaze aiming and on par usability scale. Gaze aiming produced on par task duration but lower accuracy on most conditions. Lastly, the trajectory of the target significantly affected the orientation of the HMD in relation to the target's location. The results show potential using gaze aiming in VR and motivate further research.
This paper presents a Fitts' law experiment and a clinical case study performed with a head-mounted display (HMD). The experiment compared gaze, foot, and head pointing. With the equipment setup we used, gaze was slower than the other pointing methods, especially in the lower visual field. Throughputs for gaze and foot pointing were lower than mouse and head pointing and their effective target widths were also higher. A follow-up case study included seven participants with movement disorders. Only two of the participants were able to calibrate for gaze tracking but all seven could use head pointing, although with throughput less than one-third of the non-clinical participants.
Mobile robotic telepresence systems are increasingly used to promote social interaction between geographically dispersed people. People with severe motor disabilities may use eye-gaze to control a telepresence robots. However, use of gaze control for navigation of robots needs to be explored. This paper presents an experimental comparison between gaze-controlled and hand-controlled telepresence robots with a head-mounted display. Participants (n = 16) had similar experience of presence and self-assessment, but gaze control was 31% slower than hand control. Gaze-controlled robots had more collisions and higher deviations from optimal paths. Moreover, with gaze control, participants reported a higher workload, a reduced feeling of dominance, and their situation awareness was significantly degraded. The accuracy of their post-trial reproduction of the maze layout and the trial duration were also significantly lower.
Recent methods to automatically calibrate stationary eye trackers were shown to effectively reduce inherent calibration distortion. However, these methods require additional information, such as mouse clicks or on-screen content. We propose the first method that only requires users' eye movements to reduce calibration distortion in the background while users naturally look at an interface. Our method exploits that calibration distortion makes straight saccade trajectories appear curved between the saccadic start and end points. We show that this curving effect is systematic and the result of distorted gaze projection plane. To mitigate calibration distortion, our method undistorts this plane by straightening saccade trajectories using image warping. We show that this approach improves over the common six-point calibration and is promising for reducing distortion. As such, it provides a non-intrusive solution to alleviating accuracy decrease of eye tracker during long-term use.
Various types of saccadic paradigms, in particular, Prosaccade and Antisaccade tests are widely used in Pathophysiology and Psychology. Despite been widely used, there has not been a standard tool for processing and analyzing the eye tracking data obtained from saccade tests. We describe an open-source software for extracting and analyzing the eye movement data of different types of saccade tests that can be used to extract and compare participants' performance and various task-related measures across participants. We further demonstrate the utility of the software by using it to analyze the data from an antisaccade, and a recent distractor experiment.
The button is an element of a user interface to trigger an action, traditionally using click or touch. We introduce GazeButton, a novel concept extending the default button mode with advanced gaze-based interactions. During normal interaction, users can utilise this button as a universal hub for gaze-based UI shortcuts. The advantages are: 1) easy to integrate in existing UIs, 2) complementary, as users choose either gaze or manual interaction, 3) straightforward, as all features are located in one button, and 4) one button to interact with the whole screen. We explore GazeButtons for a custom-made text reading, writing, and editing tool on a multitouch tablet device. For example, this allows the text cursor position to be set as users look at the position and tap on the GazeButton, avoiding costly physical movement. Or, users can simply gaze over a part of the text that should be selected, while holding the GazeButton. We present a design space, specific application examples, and point to future button designs that become highly expressive by unifying the user's visual and manual input.
Text predictions play an important role in improving the performance of gaze-based text entry systems. However, visual search, scanning, and selection of text predictions require a shift in the user's attention from the keyboard layout. Hence the spatial positioning of predictions becomes an imperative aspect of the end-user experience. In this work, we investigate the role of spatial positioning by comparing the performance of three different keyboards entailing variable positions for text predictions. The experiment result shows no significant differences in the text entry performance, i.e., displaying suggestions closer to visual fovea did not enhance the text entry rate of participants, however they used more keystrokes and backspace. This implies to the inessential usage of suggestions when it is in the constant visual attention of users, resulting in increased cost of correction. Furthermore, we argue that the fast saccadic eye movements undermines the spatial distance optimization in prediction positioning.
In gesture-based user interfaces, the effort needed for learning the gestures is a persistent problem that hinders their adoption in products. However, people's natural gaze paths form shapes during viewing. For example, reading creates a recognizable pattern. These gaze patterns can be utilized in human-technology interaction. We experimented with the idea of inducing specific gaze patterns by static drawings. The drawings included visual hints to guide the gaze. By looking at the parts of the drawing, the user's gaze composed a gaze gesture that activated a command. We organized a proof-of-concept trial to see how intuitive the idea is. Most participants understood the idea without specific instructions already on the first round of trials. We argue that with careful design the form of objects and especially their decorative details can serve as a gaze-based user interface in smart homes and other environments of ubiquitous computing.
For head mounted displays, like they are used in mixed reality applications, eye gaze seems to be a natural interaction modality. EyeMRTK provides building blocks for eye gaze interaction in virtual and augmented reality. Based on a hardware abstraction layer, it allows interaction researchers and developers to focus on their interaction concepts, while enabling them to evaluate their ideas on all supported systems. In addition to that, the toolkit provides a simulation layer for debugging purposes, which speeds up prototyping during development on the desktop.
The aim of the research is the introduction of new techniques that enable a visual comparison of scan-paths. Every eye tracking experiment produces many scan-paths, and one of the main challenges of eye tracking analysis is how two compare these scan-paths. A classic solution is to extract easily measurable features such as fixation durations or saccade lengths. There are also many more sophisticated techniques that compare two scan-paths using only spatial or both spatial and temporal information. These techniques typically return a value (or several values) that may be used as scan-path similarity/distance measure. However, there is still a lack of widely adopted methods that offer not only the measure but enable a visual comparison of scan-paths. The paper introduces two possible options: the Mutual Distance Plot for two scan-paths and the Warped Time Distance chart for the comparison of the theoretically unlimited number of scan-paths. It is shown that these visualizations may reveal information about relationships between two or more scan-paths on straightforward charts. The informativeness of the solution is analyzed using both artificial and real data.
Eye tracking has become a valuable approach in user-centered studies, because it adds an additional source of information that can be recorded for later analysis, instead of just relying on the task completion and the time to completion. Without proper analysis of eye tracking data, important information can remain untold. In this paper, we present an intuitive visualization for rapid data analysis of eye movement patterns. Compared to heat maps or scan path visualizations, AOI-DNAs are easy to implement and take less effort to detect eye movement patterns. Therefore, they are viable for visual data mining and offer capabilities for user interactions for extended visual analytics. In conclusion, we implemented the first prototype to use AOI-DNAs and the first version of a fuzzy search to highlight eye movement patterns. We applied the proposed visualizations in a preliminary analysis to a well-known dataset to illustrate the practical usefulness as a case study in this paper.
Advances in eye tracking technology have enabled new interaction techniques and gaze-based applications. However, the techniques for visualizing gaze information have remained relatively unchanged. We developed Iris, a tool to support the design of contextually relevant gaze visualizations. Iris allows users to explore displaying different features of gaze behavior including the current fixation point, duration, and saccades. Stylistic elements such as color, opacity, and smoothness can also be adjusted to give users creative and detailed control over the design of their gaze visualization. We present the Iris system and perform a user study to examine how participants can make use of the tool to devise contextually relevant gaze visualizations for a variety of collaborative tasks. We show that changes in color and opacity as well as variation in gaze trails can be adjusted to create meaningful gaze visualizations that fit the context of use.
Automatic Area Of Interest (AOI) demarcation of facial regions is not yet commonplace in applied eye-tracking research, partially because automatic AOI labeling is prone to error. Most previous eye-tracking studies relied on manual frame-by-frame labeling of facial AOIs. We present a fully automatic approach for facial AOI labeling (i.e., eyes, nose, mouth) and gaze registration within those AOIs, based on modern computer vision techniques combined with heuristics drawn from art. We discuss details in computing gaze analytics, provide proof-of-concept, and a short validation against what we consider ground truth. Relative dwell time over expected AOIs exceeded 98% showing efficacy of the approach.
Analyzing and visualizing eye movement data can provide useful insights into the connectivities and linkings of points and areas of interest (POIs and AOIs). Those typically time-varying relations can give hints about applied visual scanning strategies by either individual or many eye tracked people. However, the challenging issue with this kind of data is its spatio-temporal nature requiring a good visual encoding in order to first, achieve a scalable overview-based diagram, and second, to derive static or dynamic patterns that might correspond to certain comparable visual scanning strategies. To reliably identify the dynamic strategies we describe a visualization technique that generates a more linear representation of the spatio-temporal scan paths. This is achieved by applying different visual encodings of the spatial dimensions that typically build a limitation for an eye movement data visualization causing visual clutter effects, overdraw, and occlusions while the temporal dimension is depicted as a linear time axis. The presented interactive visualization concept is composed of three linked views depicting spatial, metrics-related, as well as distance-based aspects over time.
Eye movements recorded for many study participants are difficult to interpret, in particular when the task is to identify similar scanning strategies over space, time, and participants. In this paper we describe an approach in which we first compare scanpaths, not only based on Jaccard (JD) and bounding box (BB) similarities, but also on more complex approaches like longest common subsequence (LCS), Frechet distance (FD), dynamic time warping (DTW), and edit distance (ED). The results of these algorithms generate a weighted comparison matrix while each entry encodes the pairwise participant scanpath comparison strength. To better identify participant groups of similar eye movement behavior we reorder this matrix by hierarchical clustering, optimal-leaf ordering, dimensionality reduction, or a spectral approach. The matrix visualization is linked to the original stimulus overplotted with visual attention maps and gaze plots on which typical interactions like temporal, spatial, or participant-based filtering can be applied.
In this paper, we describe the design of an interactive visualization tool for the comparison of eye movement data with a special focus on the outliers. In order to make the tool usable and accessible to anyone with a data science background, we provide a web-based solution by using the Dash library based on the Python programming language and the Python library Plotly. Interactive visualization is very well supported by Dash, which makes the visualization tool easy to use. We support multiple ways of comparing user scanpaths like bounding boxes and Jaccard indices to identify similarities. Moreover, we support matrix reordering to clearly separate the outliers in the scanpaths. We further support the data analyst by complementary views such as gaze plots and visual attention maps.
Gaze-Contingent Displays (GCDs) can improve visual search performance on large displays. GCDs, a Level Of Detail (LOD) management technique, discards redundant peripheral detail using various human visual perception models. Models of depth and contrast perception (e.g., depth-of-field and foveation) have often been studied to address the trade-off between the computational and perceptual benefits of GCDs. However, color perception models and combinations of multiple models have not received as much attention. In this paper, we present GeoGCD which uses individual contrast, color, and depth-perception models, and their combination to render scenes without perceptible latency. As proof-of-concept, we present a three-stage user evaluation built upon geographic image interpretation tasks. GeoGCD does not impair users' visual search performance or affect their display preferences. On the contrary, in some cases, it can significantly improve users' performance.
We present an investigation of sharing the focus of visual attention between two players in a collaborative game, so that where one player was looking was visible to the other. The difference between using head-gaze and eye-gaze to estimate the point of regard was studied, the motive being that recording head-gaze is easier and cheaper than eye-gaze. Two experiments are reported, the first investigates the effect of a high immersion presentation of the game in VR Head Mounted Display compared with a lower immersion desktop presentation. The second examines the high immersion condition in more detail. The studies show that in spite of there being many factors that could affect the outcome of a relatively short period of game play, sharing eye-gaze in the high immersion condition produces shorter overall durations and better subjective ratings of team work than does sharing head-gaze. This difference is not apparent in the low immersion condition. The findings are a good argument for exploiting the opportunities for including and using eye tracking within head mounted displays in the context of collaborative games.
The virtual reality (VR) has nowadays numerous applications in training, education, and rehabilitation. To efficiently present the immersive 3D stimuli, we need to understand how spatial attention is oriented in VR. The efficiency of different cues can be compared using the Posner paradigm. In this study, we designed an ecological environment where participants were presented with a modified version of the Posner cueing paradigm. Twenty subjects equipped with an eye-tracking system and VR HMD performed a sandwich preparation task. They were asked to assemble the ingredients which could be either endogenously and exogenously cued in both auditory and visual modalities. The results showed that all valid cues made participants react faster. While directional arrow (visual endogenous) and 3D sound (auditory exogenous) oriented attention globally to the entire cued hemifield, the vocal instruction (auditory endogenous) and object highlighting (visual exogenous) allowed more local orientation, in a specific region of space. No differences in gaze shift initiation nor time to fixate the target were found suggesting the covert orienting.
Maps enable complex decision making, such as planning a day trip in a foreign city This kind of task often requires combining information from different parts of the map leading to a sequence of visual searches and map extent changes. Hereby, the user can easily get lost, not being able to find back to relevant points of interest (POI). In this paper, we present POITrack, a novel gaze-adaptive map which supports a user in finding previously inspected POIs faster by providing highlights. Our approach allows filtering inspected POIs based on their category and automatically adapting the current map extent. Not only could participants find visited locations faster with our system, but they also rated the interaction as more pleasing. Our findings can contribute to improving the interaction with high-density visual information, which requires revisiting of previously seen objects whose relevance for the task may not have been clear initially.
In building human robot interaction systems, it would be helpful to understand how humans collaborate, and in particular, how humans use others' gaze behavior to estimate their intent. Here we studied the use of gaze in a collaborative assembly task, where a human user assembled an object with the assistance of a human helper. We found that the being aware of the partner's gaze significantly improved collaboration efficiency. Task completion times were much shorter when gaze communication was available, than when it was blocked. In addition, we found that the user's gaze was more likely to lie on the object of interest in the gaze-aware case than the gaze-blocked case. In the context of human-robot collaboration systems, our results suggest that gaze data in the period surrounding verbal requests will be more informative and can be used to predict the target object.
Eye tracking studies have been conducted to understand the visual attention in different scenarios like, for example, how people read text, which graphical elements in a visualization are frequently attended, how they drive a car, or how they behave during a shopping task. All of these scenarios - either static or dynamic - show a visual stimulus in which the spectators are not able to change the visual content they see. This is different if interaction is allowed like in (graphical) user interfaces (UIs), integrated development environments (IDEs), dynamic web pages (with different user-defined states), or interactive displays in general as in human-computer interaction, which gives a viewer the opportunity to actively change the stimulus content. Typically, for the analysis and visualization of time-varying visual attention paid to a web page, there is a big difference for the analytics and visualization approaches - algorithmically as well as visually - if the presented web page stimulus is static or dynamic, i.e. time-varying, or dynamic in the sense that user interaction is allowed. In this paper we discuss the challenges for visual analysis concepts in order to analyze the recorded data, in particular, with the goal to improve interactive stimuli, i.e., the layout of a web page, but also the interaction concept. We describe a data model which leads to interaction graphs, a possible way to analyze and visualize this kind of eye movement data.
Webpage images---image elements on a webpage---are prominent to draw user attention. Modeling attention on webpage images helps in their synthesis and rendering. This paper presents a visual feature-based attention prediction model for webpage images. Firstly, fixated images were assigned quantitative visual attention based on users' sequential attention allocation on webpages. Subsequently, fixated images' intrinsic visual features were extracted along with position and size on respective webpages. A multiclass support vector machine (multiclass SVM) was learned using the visual features and associated attention. In tandem, a majority-voting-scheme was employed to predict the quantitative visual attention for test webpage images. The proposed approach was analyzed through an eye-tracking experiment conducted on 36 real-world webpages with 42 participants. Our model outperforms (average accuracy of 91.64% and micro F1-score of 79.1%) the existing position and size constrained regression model (average accuracy of 73.92% and micro F1-score of 34.80%).
We study attention processes to brand, price and visual information about products in online retailing websites, simultaneously considering the effects of consumers' goals, purchase category and consumers' statements. We use an intra-subject experimental design, simulated web stores and a combination of observational eye-tracking data and declarative measures.
Image information about the product is the more important stimulus, regardless of the task at hand or the store involved. The roles of brand and price information are dependent on the product category and the purchase task involved. Declarative measures of relative brand importance are found to be positively related with its observed importance.
In this paper we describe an interactive and web-based visual analytics tool combining linked visualization techniques and algorithmic approaches for exploring the hierarchical visual scanning behavior of a group of people when solving tasks in a static stimulus. This has the benefit that the recorded eye movement data can be observed in a more structured way with the goal to find patterns in the common scanning behavior of a group of eye tracked people. To reach this goal we first preprocess and aggregate the scanpaths based on formerly defined areas of interest (AOIs) which generates a weighted directed graph. We visually represent the resulting AOI graph as a modified hierarchical graph layout. This can be used to filter and navigate in the eye movement data shown in a separate view overplotted on the stimulus for preserving the mental map and for providing an intuitive view on the semantics of the original stimulus. Several interaction techniques and complementary views with visualizations are implemented. Moreover, due to the web-based nature of the tool, users can upload, share, and explore data with others. To illustrate the usefulness of our concept we apply it to real-world eye movement data from a formerly conducted eye tracking experiment.