ETRA '21 Full Papers: ACM Symposium on Eye Tracking Research and Applications

Full Citation in the ACM Digital Library

SESSION: Methods

Important Considerations of Data Collection and Curation for Reliable Benchmarking of End-User Eye-Tracking Systems

In this article we discuss how to build a reliable system to estimate the quality of a VR eye-tracker from an accuracy and robustness point of view. We list up and discuss problems that occur at the data collection, data curation and data processing stages. We address this article to academic eye-tracking researchers and commercial eye-tracker developers with the purpose of raising the problem of standardization of eye-tracking benchmarks, and to make a step towards repeatability of benchmarking results. The main scope of this article is consumer-focused eye-tracking VR headsets, however some parts also apply to AR and remote eye-trackers, and to research environments. As an example, we demonstrate how to use the proposed methodology to build, benchmark and estimate the accuracy of the FOVE0 eye-tracking headset.

Analysis of iris obfuscation: Generalising eye information processes for privacy studies in eye tracking.

We present a framework to model and evaluate obfuscation methods for removing sensitive information in eye-tracking. The focus is on preventing iris-pattern identification. Candidate methods have to be effective at removing information while retaining high utility for gaze estimation. We propose several obfuscation methods that drastically outperform existing ones. A stochastic grid-search is used to determine optimal method parameters and evaluate the model framework. Precise obfuscation and gaze effects are measured for selected parameters. Two attack scenarios are considered and evaluated. We show that large datasets are susceptible to probabilistic attacks, even with seemingly effective obfuscation methods. However, additional data is needed to more accurately access the probabilistic security.

SESSION: Visualization and Annotation

The Power of Linked Eye Movement Data Visualizations

In this paper we showcase several eye movement data visualizations and how they can be interactively linked to design a flexible visualization tool for eye movement data. The aim of this project is to create a user-friendly and easy accessible tool to interpret visual attention patterns and to facilitate data analysis for eye movement data. Hence, to increase accessibility and usability we provide a web-based solution. Users can upload their own eye movement data set and inspect it from several perspectives simultaneously. Insights can be shared and collaboratively be discussed with others. The currently available visualization techniques are a 2D density plot, a scanpath representation, a bee swarm, and a scarf plot, all supporting several standard interaction techniques. Moreover, due to the linking feature, users can select data in one visualization, and the same data points will be highlighted in all active visualizations for solving comparison tasks. The tool also provides functions that make it possible to upload both, private or public data sets, and can generate URLs to share the data and settings of customized visualizations. A user study showed that the tool is understandable and that providing linked customizable views is beneficial for analyzing eye movement data.

Image-Based Projection Labeling for Mobile Eye Tracking

The annotation of gaze data concerning investigated areas of interest (AOIs) poses a time-consuming step in the analysis procedure of eye tracking experiments. For data from mobile eye tracking glasses, the annotation effort is further increased because each recording has to be investigated individually. Automated approaches based on supervised machine learning require pre-trained categories which are hard to obtain without human interpretation, i.e., labeling ground truth data. We present an interactive visualization approach that supports efficient annotation of gaze data based on image content participants with eye tracking glasses focused on. Recordings can be segmented individually to reduce the annotation effort. Thumbnails represent segments visually and are projected on a 2D plane for a fast comparison of AOIs. Annotated scanpaths can then be interpreted directly with the timeline visualization. We showcase our approach with three different scenarios.

Neural Networks for Semantic Gaze Analysis in XR Settings

Virtual-reality (VR) and augmented-reality (AR) technology is increasingly combined with eye-tracking. This combination broadens both fields and opens up new areas of application, in which visual perception and related cognitive processes can be studied in interactive but still well controlled settings. However, performing a semantic gaze analysis of eye-tracking data from interactive three-dimensional scenes is a resource-intense task, which so far has been an obstacle to economic use. In this paper we present a novel approach which minimizes time and information necessary to annotate volumes of interest (VOIs) by using techniques from object recognition. To do so, we train convolutional neural networks (CNNs) on synthetic data sets derived from virtual models using image augmentation techniques. We evaluate our method in real and virtual environments, showing that the method can compete with state-of-the-art approaches, while not relying on additional markers or preexisting databases but instead offering cross-platform use.

SESSION: Applications

Where Do Deep Fakes Look? Synthetic Face Detection via Gaze Tracking

Following the recent initiatives for the democratization of AI, deep fake generators have become increasingly popular and accessible, causing dystopian scenarios towards social erosion of trust. A particular domain, such as biological signals, attracted attention towards detection methods that are capable of exploiting authenticity signatures in real videos that are not yet faked by generative approaches. In this paper, we first propose several prominent eye and gaze features that deep fakes exhibit differently. Second, we compile those features into signatures and analyze and compare those of real and fake videos, formulating geometric, visual, metric, temporal, and spectral variations. Third, we generalize this formulation to the deep fake detection problem by a deep neural network, to classify any video in the wild as fake or real. We evaluate our approach on several deep fake datasets, achieving 92.48% accuracy on FaceForensics++, 80.0% on Deep Fakes (in the wild), 88.35% on CelebDF, and 99.27% on DeeperForensics datasets. Our approach outperforms most deep and biological fake detectors with complex network architectures without the proposed gaze signatures. We conduct ablation studies involving different features, architectures, sequence durations, and post-processing artifacts.

Toward Eye-Tracked Sideline Concussion Assessment in eXtended Reality

As there is no currently available portable, visuomotor assessment of concussion at the sidelines, we present preliminary development of an approach based on Predictive Visual Tracking (PVT) suitable for the sidelines. Previous work has shown PVT sensitivity and specificity of 0.85 and 0.73, respectively, for standard deviation of radial error for normal and acute concussion (mild Traumatic Brain Injury, or mTBI), using a simple orbiting target stimulus. We propose new variants of the radial and tangential error metrics and conduct preliminary evaluation in Virtual Reality when applied to two different target motions (orbit and pendulum). Our new local visualization is intuitive, especially when considering evaluation of the pendulum target. Initial results indicate promise for baseline-related, personalized concussion testing in extended reality.

Crossed Eyes: Domain Adaptation for Gaze-Based Mind Wandering Models

The effectiveness of user interfaces are limited by the tendency for the human mind to wander. Intelligent user interfaces can combat this by detecting when mind wandering occurs and attempting to regain user attention through a variety of intervention strategies. However, collecting data to build mind wandering detection models can be expensive, especially considering the variety of media available and potential differences in mind wandering across them. We explored the possibility of using eye gaze to build cross-domain models of mind wandering where models trained on data from users in one domain are used for different users in another domain. We built supervised classification models using a dataset of 132 users whose mind wandering reports were collected in response to thought-probes while they completed tasks from seven different domains for six minutes each (five domains are investigated here: Illustrated Text, Narrative Film, Video Lecture, Naturalistic Scene, and Reading Text). We used global eye gaze features to build within- and cross- domain models using 5-fold user-independent cross validation. The best performing within-domain models yielded AUROCs ranging from .57 to .72, which were comparable for the cross-domain models (AUROCs of .56 to .68). Models built from coarse-grained locality features capturing the spatial distribution of gaze resulted in slightly better transfer on average (transfer ratios of .61 vs .54 for global models) due to improved performance in certain domains. Instance-based and feature-level domain adaptation did not result in any improvements in transfer. We found that seven gaze features likely contributed to transfer as they were among the top ten features for at least four domains. Our results indicate that gaze features are suitable for domain adaptation from similar domains, but more research is needed to improve domain adaptation between more dissimilar domains.

GazeMeter: Exploring the Usage of Gaze Behaviour to Enhance Password Assessments

We investigate the use of gaze behaviour as a means to assess password strength as perceived by users. We contribute to the effort of making users choose passwords that are robust against guessing-attacks. Our particular idea is to consider also the users’ understanding of password strength in security mechanisms. We demonstrate how eye tracking can enable this: by analysing people’s gaze behaviour during password creation, its strength can be determined. To demonstrate the feasibility of this approach, we present a proof of concept study (N = 15) in which we asked participants to create weak and strong passwords. Our findings reveal that it is possible to estimate password strength from gaze behaviour with an accuracy of 86% using Machine Learning. Thus, we enable research on novel interfaces that consider users’ understanding with the ultimate goal of making users choose stronger passwords.

SESSION: Gaze Input

Gaze+Hold: Eyes-only Direct Manipulation with Continuous Gaze Modulated by Closure of One Eye

The eyes are coupled in their gaze function and therefore usually treated as a single input channel, limiting the range of interactions. However, people are able to open and close one eye while still gazing with the other. We introduce Gaze+Hold as an eyes-only technique that builds on this ability to leverage the eyes as separate input channels, with one eye modulating the state of interaction while the other provides continuous input. Gaze+Hold enables direct manipulation beyond pointing which we explore through the design of Gaze+Hold techniques for a range of user interface tasks. In a user study, we evaluated performance, usability and user’s spontaneous choice of eye for modulation of input. The results show that users are effective with Gaze+Hold. The choice of dominant versus non-dominant eye had no effect on performance, perceived usability and workload. This is significant for the utility of Gaze+Hold as it affords flexibility for mapping of either eye in different configurations.

HGaze Typing: Head-Gesture Assisted Gaze Typing

This paper introduces a bi-modal typing interface, HGaze Typing, which combines the simplicity of head gestures with the speed of gaze inputs to provide efficient and comfortable dwell-free text entry. HGaze Typing uses gaze path information to compute candidate words and allows explicit activation of common text entry commands, such as selection, deletion, and revision, by using head gestures (nodding, shaking, and tilting). By adding a head-based input channel, HGaze Typing reduces the size of the screen regions for cancel/deletion buttons and the word candidate list, which are required by most eye-typing interfaces. A user study finds HGaze Typing outperforms a dwell-time-based keyboard in efficacy and user satisfaction. The results demonstrate that the proposed method of integrating gaze and head-movement inputs can serve as an effective interface for text entry and is robust to unintended selections.