ETRA23: 2023 Symposium on Eye Tracking Research and Applications

Full Citation in the ACM Digital Library

SESSION: ETRA 2023 Short Papers

A Deep Learning Architecture for Egocentric Time-to-Saccade Prediction using Weibull Mixture-Models and Historic Priors

Real-time detection of saccades is of major interest for many applications in human-computer interaction and mixed reality. However, due to relatively low update rates and high latencies of current commercially available eye trackers, gaze events are typically detected after they occur with some delay. This limits interaction scenarios such as intent-based gaze interaction, redirected walking, or gaze forecasting.

In this paper, we propose a deep learning framework for time-to-event prediction of saccades. In contrast to previous approaches, we utilize past multimodal data captured from head-mounted displays. We combine the well-established transformer architecture with a Weibull Mixture Model. This also allows estimating the uncertainty of the prediction. Additionally, we propose a sampling strategy that differs from conventional approaches to better account for the temporal properties of gaze sequences. We demonstrate that our model achieves state-of-the-art performance by evaluating it on three datasets and performing multiple ablation studies.

Area of interest adaption using feature importance

In this paper, we present two approaches and algorithms that adapt areas of interest (AOI) or regions of interest (ROI), respectively, to the eye tracking data quality and classification task. The first approach uses feature importance in a greedy way and grows or shrinks AOIs in all directions. The second approach is an extension of the first approach, which divides the AOIs into areas and calculates a direction of growth, i.e. a gradient. Both approaches improve the classification results considerably in the case of generalized AOIs, but can also be used for qualitative analysis. In qualitative analysis, the algorithms presented allow the AOIs to be adapted to the data, which means that errors and inaccuracies in eye tracking data can be better compensated for. A good application example is abstract art, where manual AOIs annotation is hardly possible, and data-driven approaches are mainly used for initial AOIs.

Bridging the Gap: Gaze Events as Interpretable Concepts to Explain Deep Neural Sequence Models

Recent work in XAI for eye tracking data has evaluated the suitability of feature attribution methods to explain the output of deep neural sequence models for the task of oculomotric biometric identification. These methods provide saliency maps to highlight important input features of a specific eye gaze sequence. However, to date, its localization analysis has been lacking a quantitative approach across entire datasets. In this work, we employ established gaze event detection algorithms for fixations and saccades and quantitatively evaluate the impact of these events by determining their concept influence. Input features that belong to saccades are shown to be substantially more important than features that belong to fixations. By dissecting saccade events into sub-events, we are able to show that gaze samples that are close to the saccadic peak velocity are most influential. We further investigate the effect of event properties like saccadic amplitude or fixational dispersion on the resulting concept influence.

Comparing Visual Search Patterns in Chest X-Ray Diagnostics

Radiologists are trained professionals who use medical images to obtain clinically relevant information. However, little is known about visual search patterns and strategies radiologists employ during medical image analysis. Thus, there is a current need for guidelines to specify optimal visual search routines commonly used by radiologists. Identifying these features could improve radiologist training and assist radiologists in their work. Our study found that during the moments in which radiologists view chest X-ray images in silence before verbalizing the analysis, they exhibit unique search patterns regardless of the type of disease depicted. Our findings suggest that radiologists’ search behaviors can be identified at this stage. However, when radiologists verbally interpret the X-rays, the gaze patterns appear noisy and arbitrary. Current deep-learning approaches train their systems using this noisy and arbitrary gaze data. This may explain why previous research still needs to show the superiority of deep-learning models that use eye tracking for disease classification. Our paper investigates these patterns and attempts to uncover the eye-gaze configurations during the different analysis phases.

Eye tracking to evaluate the effectiveness of electronic medical record training

Eye tracking has not been fully explored in the assessment of electronic medical record (EMR) training, which is typically done using subjective data. Our objective was to determine whether eye tracking can be used to investigate differences in performance between recently trained users and experts and then provide insight into any differences. After EMR training, medical personnel performed a set of medical tasks using their EMR. Their performance (accuracy and response time) and three eye tracking metrics (spatial density, mean saccade length, and mean fixation duration) were recorded. These measures were then compared to those of an expert user. The analysis showed that the expert was significantly more focused and targeted in the use of the EMR. The results suggest that eye tracking is a promising objective approach to measure the effectiveness of EMR training and the proficiency of users. Suggestions for improved EMR design and training are also provided.

Gaze Pattern Recognition in Dyadic Communication

Analyzing gaze behaviors is crucial to interpret the nature of communication. Current studies on gaze have focused primarily on the detection of a single pattern, such as the Looking-At-Each-Other pattern or the shared attention pattern. In this work, we re-define five static gaze patterns that cover all the status during a dyadic communication and propose a network to recognize these mutual exclusive gaze patterns given an image. We annotate a benchmark, called GP-Static, for the gaze pattern recognition task, on which our method experimentally outperforms other alternate solutions. Our method also achieves the state-of-art performance on other two single gaze pattern recognition tasks. The analysis of gaze patterns on preschool children demonstrates that the statistic of the proposed static gaze patterns conforms with the findings in psychology.

Gaze-based Mode-Switching to Enhance Interaction with Menus on Tablets

In design work, a common task is the interaction with menus to change the drawing mode. Done frequently, this can become a tedious and fatiguing task, especially for tablets where users physically employ a stylus or finger touch. As our eyes are naturally involved in visual search and acquisition of desired menu items, we propose gaze to shortcut the physical movement. We investigate gaze-based mode-switching for menus in tablets by a novel mode-switching methodology, assessing a gaze-only (dwell-time) and multimodal (gaze and tap) technique, compared to hand-based interaction. The results suggest that users can efficiently alternate between manual and eye input when interacting with the menu; both gaze-based techniques have lower physical demand and individual speed-error trade-offs. This led to a novel technique that substantially reduces time by unifying mode-selection and mode-application. Our work points to new roles for our eyes to efficiently short-cut menu actions during the workflow.

GE-Simulator: An Open-Source Tool for Simulating Real-Time Errors for HMD-based Eye Trackers

As eye tracking in augmented and virtual reality (AR/VR) becomes established, it will be used by broader demographics, increasing the likelihood of tracking errors. Therefore, it is important when designing eye tracking applications or interaction techniques to test them at different signal quality levels to ensure they function for as many people as possible. We present GE-Simulator, a novel open-source Unity toolkit that allows the simulation of accuracy, precision, and data loss errors during real-time usage by adding gaze vector errors into the gaze vector from the head-mounted AR/VR eye tracker. The tool is customisable without having to change the source code and changes in eye tracking errors during and in-between usage. Our toolkit allows designers to prototype new applications at different levels of eye tracking in the early phases of design and can be used to evaluate techniques with users at varying signal quality levels.

GEAR: Gaze-enabled augmented reality for human activity recognition

Head-mounted Augmented Reality (AR) displays overlay digital information on physical objects. Through eye tracking, they allow novel interaction methods and provide insights into user attention, intentions, and activities. However, only few studies have used gaze-enabled AR displays for human activity recognition (HAR). In an experimental study, we collected gaze data from 10 users on a HoloLens 2 (HL2) while they performed three activities (i.e., read, inspect, search). We trained machine learning models (SVM, Random Forest, Extremely Randomized Trees) with extracted features and achieved an up to 98.7% activity-recognition accuracy. On the HL2, we provided users with an AR feedback that is relevant to their current activity. We present the components of our system (GEAR) including a novel solution to enable the controlled sharing of collected data. We provide the scripts and anonymized datasets which can be used as teaching material in graduate courses or for reproducing our findings.

Getting the Most from Eye-Tracking: User-Interaction Based Reading Region Estimation Dataset and Models

A single digital newsletter usually contains many messages (regions). Users’ reading time spent on, and read level (skip/skim/read-in-detail) of each message is important for platforms to understand their users’ interests, personalize their contents, and make recommendations. Based on accurate but expensive-to-collect eyetracker-recorded data, we built models that predict per-region reading time based on easy-to-collect Javascript browser tracking data.

With eye-tracking, we collected 200k ground-truth datapoints on participants reading news on browsers. Then we trained machine learning and deep learning models to predict message-level reading time based on user interactions like mouse position, scrolling, and clicking. We reached 27% percentage error in reading time estimation with a two-tower neural network based on user interactions only, against the eye-tracking ground truth data, while the heuristic baselines have around 46% percentage error. We also discovered the benefits of replacing per-session models with per-timestamp models, and adding user pattern features. We concluded with suggestions on developing message-level reading estimation techniques based on available data.

Introducing Explicit Gaze Constraints to Face Swapping

Face swapping combines one face’s identity with another face’s non-appearance attributes (expression, head pose, lighting) to generate a synthetic face. This technology is rapidly improving, but falls flat when reconstructing some attributes, particularly gaze. Image-based loss metrics that consider the full face do not effectively capture the perceptually important, yet spatially small, eye regions. Improving gaze in face swaps can improve naturalness and realism, benefiting applications in entertainment, human computer interaction, and more. Improved gaze will also directly improve Deepfake detection efforts, serving as ideal training data for classifiers that rely on gaze for classification. We propose a novel loss function that leverages gaze prediction to inform the face swap model during training and compare against existing methods. We find all methods to significantly benefit gaze in resulting face swaps.

Multi-Rate Sensor Fusion for Unconstrained Near-Eye Gaze Estimation

The power requirements of video-oculography systems can be prohibitive for high-speed operation on portable devices. Recently, low-power alternatives such as photosensors have been evaluated, providing gaze estimates at high frequency with a trade-off in accuracy and robustness. Potentially, an approach combining slow/high-fidelity and fast/low-fidelity sensors should be able to exploit their complementarity to track fast eye motion accurately and robustly. To foster research on this topic, we introduce OpenSFEDS, a near-eye gaze estimation dataset containing approximately 2M synthetic camera-photosensor image pairs sampled at 500 Hz under varied appearance and camera position. We also formulate the task of sensor fusion for gaze estimation, proposing a deep learning framework consisting in appearance-based encoding and temporal eye-state dynamics. We evaluate several single- and multi-rate fusion baselines on OpenSFEDS, achieving 8.7% error decrease when tracking fast eye movements with a multi-rate approach vs. a gaze forecasting approach operating with a low-speed sensor alone.

On The Visibility Of Fiducial Markers For Mobile Eye Tracking

Invisible fiducial markers are introduced for localization of Areas Of Interest (AOIs) in mobile eye tracking applications. Fiducial markers are made invisible through the use of film passing Infra-Red (IR) light while blocking the visible spectrum. An IR light source is used to illuminate the markers which are then detected by an IR-sensitive camera, but which are imperceptible by the human eye. We provide the first empirical study that demonstrates such invisible markers are not distracting to a given task, as demonstrated in a flight simulator where distraction of visible and invisible markers are compared between experienced and novice pilots. Fixation frequency and subjective distraction scores showed that visible markers disrupted natural gaze behaviour, particularly in novice pilots. Our findings show that invisible markers should be used when there is a need for them to remain inconspicuous.

One step closer to EEG based eye tracking

In this paper, we present two approaches and algorithms that adapt areas of interest. We present a new deep neural network (DNN) that can be used to directly determine gaze position using EEG data. EEG-based eye tracking is a new and difficult research topic in the field of eye tracking, but it provides an alternative to image-based eye tracking with an input data set comparable to conventional image processing. The presented DNN exploits spatial dependencies of the EEG signal and uses convolutions similar to spatial filtering, which is used for preprocessing EEG signals. By this, we improve the direct gaze determination from the EEG signal compared to the state of the art by 3.5 cm MAE (Mean absolute error), but unfortunately still do not achieve a directly applicable system, since the inaccuracy is still significantly higher compared to image-based eye trackers.

Predicting the Allocation of Attention: Using contextual guidance of eye movements to examine the distribution of attention

Eye movements are often taken as a marker of where attention is allocated, but it is possible that the attentional window can be either tightly or broadly focused around the fixation point. Using target objects whose location could either be strongly predicted by scene context (High Certainty) or not (Low Certainty), we examined how attention was initially distributed across a scene image during search. To do so, an unexpected distractor object suddenly appeared either in the relevant or irrelevant scene region for each target type. Distractors will be more disruptive where attention is allocated. We found that for High Certainty targets, the distractors were fixated significantly more often when they appeared in relevant than irrelevant regions, but there was no such difference for Low Certainty targets. This finding demonstrated differential patterns of attentional distribution around the fixation point based on the predicted location of target objects within a scene.

Prediction Procedure for Dementia Levels based on Waveform Features of Binocular Pupil Light Reflex

A procedure for calculating the probability of levels of dementia is proposed using features of binocular pupil light reflex (PLR) to a chromatic light pulse on either eye. The PLR waveforms of 101 elderly participants, consisting of patients with Alzheimer’s disease (AD) and mild cognitive impairment (MCI) or belonging to a normal control group (NC), were measured during four experimental conditions. Three factor scores were calculated from the PLR waveform features for each response. Responses were summarised and the differences between the two eyes were calculated in order to detect asynchronicity of PLRs in response to a light pulse on either eye. Pupillary oscillation was also measured separately without light pulses, and frequency powers were evaluated. Probabilities for the level of dementia of each participant were predicted using two types of regression functions for MCI+AD or AD patients, which were optimised using a variable selection procedure for extracted features. In the results, some variables which represent asynchronous measurement and pupillary oscillation were selected, and the requirement of binocular measurement was confirmed. A prediction procedure for levels of dementia of participants using the optimised functions was proposed, and performance was evaluated.

Pupil Diameter during Counting Tasks as Potential Baseline for Virtual Reality Experiments

Pupil diameter is a reliable indicator of mental effort, but it must be baseline corrected to account for its idiosyncratic nature. Established methods for measuring baselines cannot be applied in virtual reality (VR) experiments. To reliably measure a pupil diameter baseline in VR, we propose a short testing environment of visual arithmetic tasks. In an experiment with 66 university students, we analyzed external reliability and internal validity criteria for pupil diameter measures during counting and summation tasks. During the counting task, we found a high retest reliability between stimulus intervals. Acceptable retest reliability was found for task repetition at a second measuring time. Analyzing internal validity, we found that pupil diameter increased with task difficulty comparing both tasks. Further, a linear effect was found between the pupil diameter amplitude and luminance levels. Our findings highlight the potential of counting tasks as a pupil diameter baseline for VR experiments.

SP-EyeGAN: Generating Synthetic Eye Movement Data with Generative Adversarial Networks

Neural networks that process the raw eye-tracking signal can outperform traditional methods that operate on scanpaths preprocessed into fixations and saccades. However, the scarcity of such data poses a major challenge. We, therefore, present SP-EyeGAN, a neural network that generates synthetic raw eye-tracking data. SP-EyeGAN consists of Generative Adversarial Networks; it produces a sequence of gaze angles indistinguishable from human micro- and macro-movements. We demonstrate how the generated synthetic data can be used to pre-train a model using contrastive learning. This model is fine-tuned on labeled human data for the task of interest. We show that for the task of predicting reading comprehension from eye movements, this approach outperforms the previous state-of-the-art.

Synthetic predictabilities from large language models explain reading eye movements

A long tradition in eye movement research has focused on three linguistic variables explaining fixation durations during sentence reading: word length, frequency, and predictability. Lengths and frequencies are easily obtainable but predictabilities are tedious to collect, requiring the incremental cloze procedure. Modern large language models are trained using the objective of predicting the next word given previous context, hence they readily provide predictability information. This capability has largely been overlooked in eye movement research. Here we investigate the suitability of a synthetic predictability measure, extracted from pretrained GPT-2 models, as a surrogate for cloze predictability. Using several published eye movement corpora, we find that synthetic and cloze predictabilities are highly correlated, and that their influence on eye movements is qualitatively similar. Similar patterns are obtained when including synthetic predictabilities in data sets lacking cloze predictabilities. In conclusion, synthetic predictabilities can serve as a substitute for empirical cloze predictabilities.

The Salient360! Toolbox: Processing, Visualising and Comparing Gaze Data in 3D

Eye tracking can serve as a gateway to studying the mind. For this reason it has been adopted by a diverse range of scientific communities. With the improvement of the quality of head-mounted virtual reality devices (HMDs) over the past 10 years, eye tracking has been added to capture gaze in immersive environments. The use of HMDs with eye tracking is increasing significantly and so is the need for a toolbox enabling consensus about eye tracking methods in 3D. We present the Salient360! toolbox: it implements functions to identify saccades and fixations and output gaze characteristics (e.g., fixation duration or saccade directions), to generate saliency maps, fixation maps, and scanpath data. It also implements routines made to compare gaze data that were adapted to 3D. We hope that this toolbox will spark discussions about the methodology of 3D gaze processing, facilitate running experiments, and improve the gaze study in 3D.   https://github.com/David-Ef/salient360Toolbox

TF-IDF based Scene-Object Relations Correlate With Visual Attention

The relative contribution of bottom-up and top-down attentional guidance is a central topic in vision research. Whereas attention is guided bottom-up by low-level saliency, top-down guidance involves the viewer’s knowledge and expectations accumulated throughout a lifetime. Here we explore the influence of high-level scene-object relations on viewing behavior. To assess top-down guidance, we score the relevance of linguistic object labels using methods from document analysis. Specifically, we computed the term frequency-inverse document frequency (TF-IDF), a statistic that reflects how important a term is to a document. We use object TF-IDF to measure how important a specific object is to a scene category and use these scores to predict eye movement distributions over scenes. Our results show that scene-specific objects are more likely to be fixated. Object TF-IDF had an effect partially independent of image saliency, suggesting that an object’s relevance for a scene category affects attention during scene perception.

Visual Center Biasing in a Stimulus-Free Laboratory Setting

Looking at nothing has recently become of particular interest as it may reveal insights into the nature of spatial cognition in terms of integrated mental representations from visual and auditory input. The current study applies individual time sensitive and emotional ideas to quantify visuo-spatial biases in a stimulus-free laboratory setting. We observe a strong visual bias across all experimental conditions supporting earlier assumptions of a screen center or motor bias. The tendency towards the center was particular evident during trials that lack any specific assignment. A time-sensitive differentiation of eye movements with regards to memory and anticipation tasks could not be recorded. Also, pupil diameter indicated no relationship between changes in bodily arousal and spontaneous fixation behavior. In addition, we replicate a strong left side gaze asymmetry that is interwoven with the center bias featuring spontaneous fixations to mainly cluster left from the screen center.

Visual Perception and Performance: An Eye Tracking Study

This study explores the relationship between visual perception and performance. We investigate whether eye-metrics are consistent across various visual problem-solving tasks and if task complexity affects eye-metrics. Experiments were conducted on 102 participants using Tower of Hanoi (TOH), Image Sliding Puzzle (ISP) and 4 visual reasoning tasks with increasing complexity from the CLEVR dataset. Total Scanning Duration, Fixation count, Total Fixation Duration, and Total Saccadic Duration were found significant for distinguishing good and bad performers across tasks. Peak Velocity and Mean Pupil Diameter were found significant for varying task complexity. This was also reflected in time-matched samples of good and bad performers in TOH and ISP, though the content complexity of both these tasks remained constant. We propose that Peak Velocity and Mean Pupil Diameter are markers of ‘perceived task complexity’. Poor performers perceive tasks to be more complex even when content complexity is constant, and this affects their performance.

SESSION: ETRA 2023 Doctoral Symposium

Analysis of Eye Tracking Data: Supporting Vision Screening with Eye Tracking Technologies

Interpreting gaze measurements for individual tasks, such as saccades, fixations, and smooth pursuits can be challenging for vision screening tasks on a computer screen. To address this challenge, we propose a data analysis pipeline by aligning recent approaches from vision research and computation. A pipeline gets raw eye-tracking data and provides comprehensive gaze metrics measurements and visual representation (scan path and heatmap) to give insights into ocular behaviour for vision screening tasks. Our preliminary studies have focused on current trends, challenges, and eye-tracking technologies for supporting the vision screening of children. This approach is important for providing a conceptual framework for gaze analysis aiming to support vision experts.

Calibration free eye tracking solution for mobile and embedded devices

In this study we propose a competent low-cost eye tracking solution that is able to run on any mobile device, independently of the hardware that is equipped with. The rapid evolution of technologies has enabled to work with many neural network structures that some years ago were out of reach. The project will start from a solution which Irisbond (https://www.irisbond.com/) company has been working on, which gives precision values of 3 and 6 degrees for calibration and calibration-free use cases respectively. The goal of the solution is to try to develop a usable solution in the Augmented and Alternative Communication (AAC) field across different types of devices, from mobile to embedded devices. To achieve such an objective, two main goals have been set out during this study. One the one hand I (we) aim at removing the initial calibration step to reach a calibration-free solution. On the other hand, I (we) seek to separate the functionality of a software into independent, interchangeable modules to fit the different target device limitations.

Discussing the importance of calibration in low-cost gaze estimation solutions

Calibration of gaze estimation systems has proven to be a key process in high-end systems. Now that the research effort is focused on low-cost systems, which are closer to the end-user and their day-to-day life, it is necessary to rethink a theoretical framework adapted to this new reality. This document details the proposal of a doctoral thesis focused on the dissection of the calibration process in low-cost systems, its importance and working line.

Evaluating Human Eye Features for Objective Measure of Working Memory Capacity

Eye tracking measures can provide means to understand the underlying development of human working memory. In this study, we propose to develop machine learning algorithms to find an objective relationship between human eye movements via oculomotor plant and their working memory capacity, which determines subjective cognitive load. Here we evaluate oculomotor plant features extracted from saccadic eye movements, traditional positional gaze metrics, and advanced eye metrics such as ambient/focal coefficient , gaze transition entropy, low/high index of pupillary activity (LHIPA), and real-time index of pupillary activity (RIPA). This paper outlines the proposed approach of evaluating eye movements for obtaining an objective measure of the working memory capacity and a study to investigate how working memory capacity is affected when reading AI-generated fake news.

Eye Tracking for Virtual Customer Requirements Evaluations

High-quality products can be achieved by developing product features in accordance with customer requirements. In the following approach eye tracking is used to accomplish virtual design reviews to validate the conformity of products with requirements during the product development process. Gaze data are thereby utilizing to clarify, prioritize and enrich customer statements during design reviews using gaze data. Thereby, the goal is to establish a method to complement qualitative methods with eye tracking to ultimately reach higher product quality and customer satisfaction. The research work thus lays a foundation for more objective and low-effort customer requirement analyses. In this way, it paves the way for increased customer co-creation and quality-oriented product development for consumer products as well.

Gaze-based Interaction on Handheld Mobile Devices

With the advancement of smartphone technology, it is now possible for smartphones to run eye-tracking using the front-facing camera, enabling hands-free interaction by empowering mobile users with novel gaze-based input techniques. While several gaze-based interaction techniques have been proposed in the literature, these techniques were deployed in settings different from daily gaze interaction with mobile devices, posing several unique challenges. The user’s holding posture may hinder the camera’s view of their face during the interaction, the front-facing camera may be obstructed by the user’s clothing or hands, or the environment is shaky due to the user’s movements and the dynamic environment. This PhD research investigates the usability of state-of-the-art gaze-based input techniques in mobile settings, develops a novel concept of combining multiple gaze-based techniques, and addresses the challenges imposed by the unique aspects of these devices.

Multimodal machine learning for cognitive load based on eye tracking and biosensors

Eye tracking and virtual reality are set to drive the coming decade's most innovative developments in healthcare. Two key application areas stand at the forefront: cost-effective clinical and paraclinical training, and interactive virtual settings for patients in therapy and rehabilitation. As such, our main research focus will be to develop multimodal solutions based on eye tracking and bio-signals for cognitive load assessment within the broader spectrum of applied computer science and digital health. One of the objectives is to respond to the need for healthcare to become ever more individualized and multimodal, as exemplified by personalized medicine and digital therapeutics. The use of eye-tracking in conjunction with digital biomarkers intends to be a quantitative basis for care providers to adapt their therapies to user and patient needs.

SESSION: ETRA 2023 Late-Breaking Work (Poster Abstracts)

A Dataset of Underrepresented Languages in Eye Tracking Research

A number of factors come together to limit the diversity of eye-tracking research, where the majority of papers are conducted with stimuli in the English language. Studying eye movement over other languages is important considering that each language provides unique insights into human cognition. Recently, there have been valued efforts to present datasets from other languages, yet these efforts focused mostly on European languages. In this paper we highlight issues that limit diversity in eye tracking research on reading, and we present our work in collecting an open-access multilingual reading dataset of underrepresented languages. Utilizing a high-frequency research eye tracker (EyeLink 1000 Plus), we record eye tracking data of native and second language readers of English, Spanish, Chinese, Hindi, Russian, Arabic, Japanese, Kazakh, Urdu, and Vietnamese. The dataset includes demographics, language proficiency self-reporting, and answers to comprehension questions. The current version of the dataset, which we make publicly available, consists of 97 trials by 40 participants. With the goal of increasing the number of participants and included languages, we aim to make studying underrepresented languages more accessible to researchers and tool makers.

A Preliminary Investigation on Eye Gaze-based Estimation of Self-efficacy during a Dexterity Task

We investigated the relationship between eye gaze and self-efficacy based on anticipatory gaze. The experimental results showed that the correlation coefficient between saccade distance and self-efficacy from the questionnaire was more than 0.5, suggesting that self-efficacy could be measured from eye gaze.

A temporally quantized distribution of pupil diameters as a new feature for cognitive load classification

In this paper, we present a new feature that can be used to classify cognitive load based on pupil information. The feature consists of a temporal segmentation of the eye tracking recordings. For each segment of the temporal partition, a probability distribution of pupil size is computed and stored. These probability distributions can then be used to classify the cognitive load. The presented feature significantly improves the classification accuracy of the cognitive load compared to other statistical values obtained from eye tracking data, which represent the state of the art in this field. The applications of determining Cognitive Load from pupil data are numerous and could lead, for example, to pre-warning systems for burnouts.

Automated Detection of Geometric Structures in Gaze Data

Automated detection of individual problem-solving strategies is mandatory for the realization of adaptive learning systems (ALS). In this context, we present a new algorithm which is able to detect dynamic geometric structures (slope triangles) in gaze data of 62 learners using automated fixation-clustering and temporal networks. Preliminary results show a promising performance for fixed bandwidth parameters of the clustering algorithm (average recall: 0.60, maximum precision: 0.66).

Collaboration Assistance Through Object Based User Intent Detection Using Gaze Data

As eye-tracking technology becomes increasingly prevalent in augmented reality (AR), new opportunities arise for collaborative applications. In this paper, we propose a novel approach to improve collaborative interaction through object-based user intent detection using gaze data. Our system uses reinforcement learning (RL) to dynamically adapt the user interface based on the context of the collaborative task. The system visualizes the user’s intent on a shared environment, allowing for improved collaborative awareness between users. We evaluate our approach in a user study scenario focused on visual search tasks. The results demonstrate that our system significantly improves task completion times and reduces cognitive load for users. Additionally, subjective feedback suggests that users are more aware of each other’s activity, further highlighting the benefits of our approach. We encourage conducting future user studies to assess the suitability of our approach for additional collaborative tasks.

Comparing Attention to Biological Motion in Autism across Age Groups Using Eye-Tracking

This study tracked eye movement in children with and without autism spectrum disorder (ASD) watching emotional biological and non-biological motion point-light-displays (PLDs). Older children with ASD focused on extremities while older typically developing (TD) children looked at figure’s heads, whereas not evident in the younger groups. These results suggest developmental advances in social-information biases in TD children not evident in children with ASD, together with atypical and potentially adaptive increases in attentional biases towards local motion cues with age in ASD. Potential avenues for future computational and methodological analyses are discussed.

Demonstrating Eye Movement Biometrics in Virtual Reality

Thanks to the eye-tracking sensors that are embedded in emerging consumer devices like the Vive Pro Eye, we demonstrate that it is feasible to deliver user authentication via eye movement biometrics.

Deriving Cognitive Strategies from Fixations during Visual Search in Scenes

Research has established that scene context improves performance in visual search affecting attentional guidance by providing predictions and expectations about where target objects are likely to be located. Here, the reliability of scene context is manipulated, by using an Exploitation/Exploration framework. The results showed that not only is context important, but that it depended on its reliability, showing a dynamic application over time. Eye movement patterns reflect this dynamic application of high-level information. This framework will allow us to define and detect different cognitive strategies as well as the switch between them, signaling a change in prioritization and a change in the influence of context.

Detecting colour vision deficiencies via Webcam-based Eye-tracking: A case study

Webcam-based eye-tracking platforms have recently re-emerged due to improvements in machine learning-supported calibration processes and offer a scalable option for conducting eye movement studies. Although not yet comparable to the infrared-based ones regarding accuracy and frequency, some compelling performances have been observed, especially in those scenarios with medium-sized AOI (Areas of Interest) in images. In this study, we test the reliability of webcam-based eye-tracking on a specific task: Eye movement distribution analysis for CVD (Colour Vision Deficiency) detection. We introduce a new publicly available eye movement dataset based on a pilot study (n=12) on images with dominant red colour (previously shown to be difficult with dichromatic AOI to investigate CVD by comparing attention patterns obtained in webcam eye-tracking sessions). We hypothesized that webcam eye tracking without infrared support could detect differing attention patterns between CVD and non-CVD participants and observed statistically significant differences, allowing the retention of our hypothesis.

Enhancing the Metacognition of Nursing Students Using Eye Tracking Glasses

Practical simulation is increasingly used to develop reasoning skills during learning. The analysis of the scene and the correct execution of actions require an awareness of the situation and the activities performed by the student. Eye-tracking feedback (i.e., a video recording of the practical simulation with an overlay of the gaze point) can allow students and teachers to enhance the skills of analysis and execution of the practical activities performed. In this article, we present the implementation of an innovative pedagogical process for nursing students in Switzerland. It involves the use of eye-tracking glasses to improve learning through the enhancement of metacognition after a simulation. The results of a first test session done with 15 undergraduate students are reported.

Estimation of Latent Attention Resources using Microsaccade Frequency during a Dual Task

Visual attention is estimated as temporal latent resource levels using a state-space model and microsaccade (MS) frequency measurements taken during a dual task. In order to optimise the model, MS frequency was evaluated every 0.5 seconds. As a result, the contribution of the accuracy of the response to the task was confirmed by the estimated resource levels.

EyeExplore: An Interactive Visualization Tool for Eye-Tracking Data for Novel Stimulus-based Analysis

The state-of-the-art visualization tools for multidimensional gaze or eye-tracking data focus on a few dimensions leading to incomplete analysis, e.g., fixations as time series data, scanpath trajectory, etc. We propose EyeExplore, an interactive visualization tool to explore eye-tracking data for a single static 2D stimulus (i.e., a selected image). It has multiple views, namely, single-user, user comparative, and cohort-summary views. We propose the use of ensemble clustering and visualization of co-association matrices for cohort analysis. We propose the use of semantics-aware areas of interest (AOI) through user interactivity leading to AOI transition matrix visualization. Our preliminary results show that EyeExplore provides a more complete data exploration.

Foveated Noise Reduction: Visual Search Tasks under Spatio-Temporal Control Synchronized with Eye Movements

The authors propose a spatiotemporal noise reduction system called Foveated Noise Reduction. This system reduces ambient motion noise synchronized with eye movements and spatiotemporal blurring in peripheral vision. Preliminary experiments were conducted with the visual search task in ambient motion noise. The authors evaluated task performance when varying spatiotemporal blurring parameters. The results suggest that the spatial reduction affected the task performance, and the blur size would be suitable under the visual search task with ambient noise.

Indirect gaze estimation from body movements based on relationship between gaze and body movements

This paper proposes indirect gaze estimation from body movements as an alternative to directly observing the eye regions, as done by conventional eye trackers. This is based on the fact that changes in gaze cause corresponding changes in the movements of other body parts. Experimental results show that the proposed method can limit estimation errors to about 10 degrees.

Investigating Phishing Awareness Using Virtual Agents and Eye Movements

Phishing emails typically attempt to persuade recipients to reveal private or confidential information (e.g., passwords or bank details). Interaction with such emails places individuals at risk of financial loss and identity theft. We present an ongoing study using eye tracking metrics and varying interface components to assess users’ ability to spot simulated phishing attempts. Findings seek to establish how users interact with email inbox interfaces and will inform future design of usable security tools.

Leveraging Eye Tracking Technology for a Situation-Aware Writing Assistant

Intelligent writing assistants use artificial intelligence to support the partial automation of the writing process. Existing research has investigated the interaction between humans and automated systems and has identified the maintenance of situation awareness (SA) as a key challenge for humans. Especially in the context of intelligent writing assistants, humans have to maintain SA as they are held responsible for the written text. Eye tracking is the key technology that enables the non-invasive detection of situation awareness based on eye movements. Building on existing research on human-robot/AI collaboration and their interplay with SA theory, we propose the augmentation of human interaction with intelligent writing assistants through the use of eye tracking technology. On this basis, writing assistants can be adapted to users’ cognitive states such as SA. We argue that for the successful implementation of intelligent writing assistants in the real world, eye-based analysis of SA and augmentation are key.

Model-based deep gaze estimation using incrementally updated face-shape parameters

In this paper, we propose a method to improve the performance of deep gaze estimation using face-shape parameters adapted to a specific target person based on multiple observations. Our gaze estimation network contains a predefined computation module that calculates gaze directions using known geometric relationships among head poses, eye-ball positions, and gaze directions. Updated face-shape parameters contribute to improving the performance of the process. In addition, the computation module enables a network to acquire the ability to induce hidden parameters such as eyeball position and eyeball radius from observed information through a training process. Experimental results reveal improvement in gaze estimation accuracy by introducing a sequential update process for face-shape parameters and a predefined computation module.

Navigating Virtual Worlds: Examining Spatial Navigation Using a Graph Theoretical Analysis of Eye Tracking Data Recorded in Virtual Reality

In this work we apply a graph-theoretical analysis approach to eye tracking data recorded in virtual reality to investigate the underlying patterns of visual attention during spatial navigation. Based on the eye tracking data recorded in one virtual city, our graph-theoretical analysis identifies a subset of houses outstanding in their graph-theoretical properties which we define as gaze-graph-defined landmarks. Moreover, we are able to replicate these results with a different eye tracking data set recorded in a different virtual city. Finally, the initial model selection process of the participant’s performance in a point-to-building task in the second city suggests a stronger influence of graph-theoretical predictors on the performance compared to the non-graph related measures, however more research will be necessary to determine their relationship.

Old or Modern? A Computational Model for Classifying Poem Comprehension using Microsaccades

On the Pursuit of Developer Happiness: Webcam-Based Eye Tracking and Affect Recognition in the IDE

Recent research highlights the viability of webcam-based eye tracking as a low-cost alternative to dedicated remote eye trackers. Simultaneously, research shows the importance of understanding emotions of software developers, where it was found that emotions have significant effects on productivity, code quality, and team dynamics. In this paper, we present our work towards an integrated eye-tracking and affect recognition tool for use during software development. This combined approach could enhance our understanding of software development by combining information about the code developers are looking at, along with the emotions they experience. The presented tool utilizes an unmodified webcam to capture video of software developers while interacting with code. The tool passes each frame to two modules, an eye tracking module that estimates where the developer is looking on the screen, and an affect recognition module that infers developer emotion from their facial expressions. The proposed work has implications to researchers, educators, and practitioners, and we discuss some potential use cases in this paper.

On the Value of Data Loss: A Study of Atypical Attention in Autism Spectrum Disorder Using Eye Tracking

Data loss in eye-tracking studies is often considered a nuisance variable or noise. This study examined the value of data loss in eye tracking and proposed a new method to utilize lost data in predicting the clinical characteristics of autism spectrum disorder (ASD). We used eye tracking to confirm previous findings on atypical attention patterns and further utilized behavior coding to examine the three types of causes of data loss including blinks, non-compliant behaviors, and technical errors. We discovered that data loss due to blinking was associated with a lack of interest in social cues, and data loss due to non-compliance predicted a greater severity of ASD symptoms. These results suggest that the loss of data in eye tracking is meaningful as a measure of diminished social attention and a reflection of clinical characteristics in ASD.

Pupillometry for Measuring User Response to Movement of an Industrial Robot

Interactive systems can adapt to individual users to increase productivity, safety, or acceptance. Previous research focused on different factors, such as cognitive workload (CWL), to better understand and improve the human-computer or human-robot interaction (HRI). We present results of an HRI experiment that uses pupillometry to measure users’ responses to robot movements. Our results demonstrate a significant change in pupil dilation, indicating higher CWL, as a result of increased movement speed of an articulated robot arm. This might permit improved interaction ergonomics by adapting the behavior of robots or other devices to individual users at run time.

pymovements: A Python Package for Eye Movement Data Processing

We introduce pymovements: a Python package for analyzing eye-tracking data that follows best practices in software development, including rigorous testing and adherence to coding standards. The package provides functionality for key processes along the entire preprocessing pipeline. This includes parsing of eye tracker data files, transforming positional data into velocity data, detecting gaze events like saccades and fixations, computing event properties like saccade amplitude and fixational dispersion and visualizing data and results with several types of plotting methods. Moreover, pymovements also provides an easily accessible interface for downloading and processing publicly available datasets. Additionally, we emphasize how rigorous testing in scientific software packages is critical to the reproducibility and transparency of research, enabling other researchers to verify and build upon previous findings.

Seeing Through Their Eyes - A Customizable Gaze-Contingent Simulation of Impaired Vision and Other Eye Conditions Using VR/XR Technology

People with vision impairments see the world differently. Combining the forces of ophthalmology and modern computing we employ Unreal Engine 5 with Steam VR base stations, an eye-tracking capable Varjo Aero VR headset, and a handheld slate with an embedded HTC Vive tracker to record the changes in the person’s functional vision and make the implied changes in visual functions explicit by applying the resulting modifications to a live VR scene. The setup enables us to track participant’s head, gaze and object movements simultaneously, which in turn produces more data that can help in diagnosis and in teaching of assistive skills like steady eye technique.

The Effect of Curiosity on Eye Movements During Reading of Health Related Arguments

The influence of pupil ellipse noise on the convergence time of a glint-free 3D eye tracking algorithm

Eye-tracking is a key sensing technology for upcoming retinal projection augmented reality (AR) glasses. State-of-the-art eye-tracking sensor technologies rely on video oculography (VOG) and 3D model based gaze estimation algorithms, which infer gaze from observations of the projected pupil over time. The convergence time of these algorithms relies heavily on the pupil ellipse fitting accuracy. In this work, we investigate the effects of pupil ellipse contour noise and pupil center noise on the convergence time of a state-of-the-art eye-tracking approach and show that the convergence time relies heavily on a sub-pixel accurate pupil ellipse fitting and can reach tens of seconds for inaccurately fitted ellipses.

The Tiny Eye Movement Transformer

In this paper, we evaluate different small neural network models for eye movement classification and show our so far developed improved model architecture. For evaluation, we used a subset (1.5 million sequences) of the TEyeDS annotations since it contains in the wild recordings and has the most eye movement annotations to our knowledge. We classified fixations, saccades, and smooth pursuits with four different network architectures and the proposed model improves the equally weighted accuracy by 3.8% to the best competitor while only using 6% of the amount of learnable weights.

Time of Day Effects on Eye-Tracking Acquisition in Infants at Higher Likelihood for Atypical Developmental Outcomes: Time of Day Effects on Eye Tracking Data Acquisition in Vulnerable Infants

Time of Day (ToD) of eye-tracking data acquisition may be a confounding variable that disproportionately impacts certain groups. This research examines attentional differences between infants with Lower-Likelihood (LL) and Higher-Likelihood (HL) for atypical developmental outcomes. We find that LL infants tend to pay more overall attention to eye-tracking probes during midday than their HL counterparts. Future research should examine and address factors underlying ToD-associated group differences and explore frameworks for systematically addressing additional eye-tracking confounding variables.

Watch out for those bananas! Gaze Based Mario Kart Performance Classification

This paper is about a small eye tracking study for scan path classification. Seven participants played Mario Kart while wearing a head mounted eye tracker. In total, we had 64 recordings, but one had to be removed (Only 79 gaze samples were recorded). We compared different scan path classification features to estimate the performance of the participants based on the ranking they achieved. The best performing feature was ENCODJI which incooperates saccades and the heatmap in one feature. HOV, which uses saccade angles, performed well for all tasks but was outperformed by the heatmap (HEAT) for the last two groups.

Word Familiarity Classification From a Single Trial Based on Eye-Movements. A Study in German and English

Identifying processing difficulty during reading due to unfamiliar words has promising applications in automatic text adaptation. We present a classification model that predicts whether a word is (un)known to the reader based on eye-movement measures. We examine German and English data and validate our model on unseen subjects and items achieving a high accuracy in both languages.

SESSION: PETMEI 2023 Session I

ZING: An Eye-Tracking Experiment Software for Organization and Presentation of Omnidirectional Stimuli in Virtual Reality

The growing field of eye-tracking enables many researchers to investigate human (subconscious) behavior unobtrusively, naturally, and non-invasive. For that, a highly natural, immersive, and controllable environment is essential to investigators. Currently, next to mobile eye-tracking in the wild, virtual reality is becoming state-of-the-art for such experiments, combining several eye-tracking modalities’ strengths. Next to simulations, omnidirectional video footage is massively used. 360°cameras capture these videos with resolutions of up to 16k. Afterward, they can be replayed on virtual reality glasses to learn about human behavior in a realistic, highly controlled environment. However, the pipeline from stitched video to eye-tracking experiment results depends on costly proprietary or self-developed software that lacks standardization, leading to recurrent reimplementation. This paper describes an open-source stimuli organization and presentation software implementation that enables researchers to easily organize their stimuli in a standardized way and conduct eye-tracking studies in virtual reality with a few clicks without knowledge about coding or technical details. The code is available at https://bitbucket.org/benediktwhosp/zing

Implicit User Calibration for Gaze-tracking Systems Using Saliency Maps Filtered by Eye Movements

In recent studies on gaze tracking systems using 3D model-based methods, the optical axis of the eye was estimated without user calibration. The remaining challenge in achieving implicit user calibration is estimating the difference between the optical and visual axes of the eye (angle κ). In this study, we propose two methods that improve the implicit user calibration method using saliency maps, focusing on eye movement to reduce calculation costs while maintaining accuracy.

SESSION: PETMEI 2023 Session II

Highlighting the Challenges of Blinks in Eye Tracking for Interactive Systems

Eye tracking is the basis for many intelligent systems to predict user actions. A core challenge with eye-tracking data is that it inherently suffers from missing data due to blinks. Approaches such as intent prediction and user state recognition process gaze data using neural networks; however, they often have difficulty handling missing information. In an effort to understand how prior work dealt with missing data, we found that researchers often simply ignore missing data or adopt use-case-specific approaches, such as artificially filling in missing data. This inconsistency in handling missing data in eye tracking hinders the development of effective intelligent systems for predicting user actions and limits reproducibility. Furthermore, this can even lead to incorrect results. Thus, this lack of standardization calls for investigating possible solutions to improve the consistency and effectiveness of processing eye-tracking data for user action prediction.

ZERO: A Generic Open-Source Extended Reality Eye-Tracking Controller Interface for Scientists

Virtual reality and eye-tracking technologies are nowadays standard research tools. A growing number of researchers from different disciplines are utilizing these technologies. Currently, access to eye-tracking hardware in virtual reality glasses is usually provided as APIs to call functions of the eye-tracking device. Proper implementation is device-specific and left to the user. Especially non-computer scientists are left alone with this problem, which impedes eye-tracking research in virtual reality for many scientists. This paper describes a generic open-source interface for everyone to efficiently and easily utilize common eye trackers in virtual reality. The interface is published under a friendly CC BY 4.0 license that allows for integration, modification, and extension of the code. It includes a standardized interface for several eye-tracking devices in virtual reality, is ready to be used out of the box, and allows easy addition of APIs from other manufacturers. The code is available at https://bitbucket.org/benediktwhosp/zvsl-zero

Detecting Blinks from Wearable Cameras using Spatial-Temporal-Aware Deep Network Learning

Blinks have been widely studied in various fields including medical and human computer interactions, and in driver fatigue. Automatic detection of blinks has valuable practical importance. While existing deep neural networks excel in extracting spatial features from images and demonstrate impressive performance in visual object recognition, their application for blink detection in videos on a frame-by-frame basis is suboptimal, as they only consider spatial features from single images. In this paper, we developed a spatial-temporal-aware deep learning framework that capitalizes on the rapid advancements of the existing state-of-the-art visual object recognition networks, aiming to enhance their performance specifically in blink detection. Our framework takes consecutive frames as input to extract spatial and temporal features simultaneously for better detection of eye movements. We also propose a sliding window re-sampling strategy to mitigate overfitting on training data. Extensive experimental evaluations and comparisons demonstrate the feasibility of the proposed algorithm, which delivers excellent performance for detecting blinks.

SESSION: ETVIS Session I: Visual Attention and Strategies

Gaze is more than just a point: Rethinking visual attention analysis using peripheral vision-based gaze mapping

In mobile eye-tracking, visual attention is commonly evaluated using fixation-based measures, which can be mapped to predefined objects of interest for task-specific attention analysis. Even though attention can be directed independently from the fovea, little research can be found on the quantification of peripheral vision for attention analysis. In this work, we discuss the benefits of enhancing traditional mapping methods with near-peripheral information and expand previous research by presenting a novel machine learning-based gaze measure, the visual attention index (VAI), for the analysis of visual attention using dynamic stimuli. Results are discussed using the data of two multi-object mobile eye tracking use cases and visualized using radar graphs.

We show that by combining foveal and peripheral vision the VAI is effective for the comparison of visual attention over multiple tasks, trials and subjects, which offers new possibilities for a more realistic and detailed depiction of visual attention in multi-object tasks.

Reading Strategies for Graph Visualizations that Wrap Around in Torus Topology

We investigate reading strategies for node-link diagrams that wrap around the boundaries in a flattened torus topology by examining eye tracking data recorded in a previous controlled study. Prior work showed that torus drawing affords greater flexibility in clutter reduction than traditional node-link representations, but impedes link-and-path exploration tasks, while repeating tiles around boundaries aids comprehension. However, it remains unclear what strategies users apply in different wrapping settings. This is important for design implications for future work on more effective wrapped visualizations for network applications, and cyclic data that could benefit from wrapping. We perform visual-exploratory data analysis of gaze data, and conduct statistical tests derived from the patterns identified. Results show distinguishable gaze behaviors, with more visual glances and transitions between areas of interest in the non-replicated layout. Full-context has more successful visual searches than partial-context, but the gaze allocation indicates that the layout could be more space-efficient.

SESSION: ETVIS Session II: Tools and Applications

Representing (Dis)Similarities Between Prediction and Fixation Maps Using Intersection-over-Union Features

A classic evaluation of the quality of gaze prediction models consists in comparing a set of ground truth fixation maps against a set of predictions. The quality of the prediction depends on the spatial similarity between the predicted and the observed fixated and non-fixated areas. Typically, (dis)similarity is evaluated by computing distribution-based metrics. However, the shortcoming of the metric scores is that they provide no information about the different types of (dis)similarities present in the prediction, for example, to determine whether the prediction fails wholly or partially to account for all the fixations. In this paper, we propose a set of features for representing the spatial (dis)similarities using intersection-over-union features, which provide helpful information that cannot be retrieved with traditional metrics for analyzing and evaluating prediction maps. We exemplify the usage of the features by analyzing the performance of different prediction models on a saliency benchmark dataset.

Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Comparative Gaze Analysis

We present a novel, web-based visual eye-tracking analytics tool called Gazealytics. Our open-source toolkit features a unified combination of gaze analytics features that support flexible exploratory analysis, along with annotation of areas of interest (AOI) and filter options based on multiple criteria to visually analyse eye tracking data across time and space. Gazealytics features coordinated views unifying spatiotemporal exploration of fixations and scanpaths for various analytical tasks. A novel matrix representation allows analysis of relationships between such spatial or temporal features. Data can be grouped across samples, user-defined AOIs or time windows of interest (TWIs) to support aggregate or filtered analysis of gaze activity. This approach exceeds the capabilities of existing systems by supporting flexible comparison between and within subjects, hypothesis generation, data analysis and communication of insights. We demonstrate in a walkthrough that Gazealytics supports multiple types of eye tracking datasets and analytical tasks.

Privacy in Eye Tracking Research with Stable Diffusion

Image-generative models take textual prompts as input and generate almost arbitrary image content based on the underlying training data. This technology is rapidly developing and produces better results with each new generation of trained models. Apart from the application to create artwork, we see potential in deploying such models for eye-tracking research with respect to anonymizing content in visual stimuli. One feature of such models is the ability to take an image as input and adjust content according to a prompt. Hence, privacy-preserving visualization of stimuli can be achieved for static images and videos by slightly adjusting content to anonymize persons, text, and other sensible sources. In this work, we will discuss how this process can be applied to the presentation and dissemination of results with respect to privacy issues resulting from eye-tracking experiments.

SESSION: Eyes4ICU

Reanalyzing Effective Eye-related Information for Developing User’s Intent Detection Systems

Studies on gaze-based interactions have utilized natural eye-related information to detect user intent. Most use a machine learning-based approach to minimize the cost of choosing appropriate eye-related information. While those studies demonstrated the effectiveness of an intent detection system, understanding which eye-related information is useful for interactions is important. In this paper, we reanalyze how eye-related information affected the detection performance of a previous study to develop better intent detection systems in the future. Specifically, we analyzed two aspects of dimensionality reduction and adaptation to different tasks. The results showed that saccade and fixation are not always useful, and the direction of gaze movement could potentially cause overfitting.

Index Pupil Activity Echoing with Task Difficulty in Fitts’ Law Setting

Research has found that changes in mental workload during both cognitive and motor tasks can be indicated by pupil dilation. To measure cognitive workload, researchers have developed tools such as the wavelet analysis-based index of cognitive activity (ICA) and index of pupil activity (IPA). However, it is still unclear whether these tools can accurately measure workload during motor tasks. This study aims to investigate whether the IPA can differentiate task workload during a motor task involving aiming in a tele-operation setting, where the task requirements were quantified using Fitts’ index of difficulty(ID) based on target size and distance. The study found that the IPA can differentiate between different levels of motor task workload, provided the proper window is used, namely the period before the tool touches the targets. This finding is significant as it can aid in developing objective methods to evaluate task workload using pupil parameters during goal-directed movements.

Towards Developing an Animation Kit for Functional Vision Screening with Eye Trackers

Developing accessible tools that support the identification of functional vision problems with reliable measurements of eye movements is a common interest. To develop such an open source tool considering eye tracker data needs automatic processing of eye data, handling time stamps, dynamic data generation, and parameter adaptation. This paper illustrates the possibility of such a tool for saccadic, smooth pursuit and circular eye movements and discusses future steps.

Eye Tracking as a Source of Implicit Feedback in Recommender Systems: A Preliminary Analysis

Eye tracking in recommender systems can provide an additional source of implicit feedback, while helping to evaluate other sources of feedback. In this study, we use eye tracking data to inform a collaborative filtering model for movie recommendation providing an improvement over the click-based implementations and additionally analyze the area of interest (AOI) duration as related to the known information of click data and movies seen previously, showing AOI information consistently coincides with these items of interest.

A Review of Eye Tracking in Advanced Driver Assistance Systems: An Adaptive Multi-Modal Eye Tracking Interface Solution

Advanced driving assistance systems are a useful technology that helps improve driving safety. These systems provide the driver with multi-modal information about the driving performance, however, the interaction between the driver and these interfaces is complex and requires the systems to adapt to the requests of the driving task. The proposed solution address this challenge, presenting an overview of the literature on the subject, with the objective of proposing an adaptive multi-modal interface based on eye tracking monitoring.

Reviewing the Social Function of Eye Gaze in Social Interaction

The role of gaze in communication is diverse and of the essence, particularly thanks to its dual function which both encodes and signals mental and emotional states, and the direction and object of attention. To date, substantial research on the attentional and perceptual effects of gaze behavior has established four main forms of social eye gaze, and each form plays a role in establishing, grounding, and ensuring communication between interlocutors. However, while the perceptual effects of gaze on attentional and cognitive processes are well-elaborated, social and affective effects of gaze and, particularly, mutual gaze attracted little attention. The present review suggests that the mutual gaze has a unique role in social interactions since it has both perceptual and affective effects in social interaction, and the unique affective and social role of mutual gaze in real-time and naturalistic settings still needs further research.

An irrelevant look of novice tram driver: Visual attention distribution of novice and expert tram drivers

The present study explores differences in attention distribution of tram drivers with different expertise while watching tram driving simulations. Forty-seven participants participated in this experiment in two groups (23 experts and 24 novices). The results show between-group differences in attention dynamics. In line with prediction, the novices concentrated more on the middle panel of the tram simulator related to speed control than the experts. The study is the first step in designing gaze-based training for novice tram drivers.

Using Eye Tracking to detect Faking Intentions

Recent studies of eye movements in faking behaviours are reviewed. The best candidates to reveal deceptive intentions are the number and duration of fixations, while saccade amplitudes and pupil size show great potential. Special consideration about experimental design and individual abilities that could impact the physiological responses should be taken when studying faking intentions.

SESSION: EduEye

The effect of intersemiotic relations on L2 learners’ multimodal reading

The paper adopts a mixed-method approach (online and eye-tracking experiments) to investigate which image-text relation in multimodal texts, namely image-subordinate-to-text (IST) and text-subordinate-to-image (TSI), creates a strong visual or verbal mental representation in the case of second language (L2) learners. Thirty-eight Hungarian L2 learners with B1 English language proficiency attended the online experiment to read and respond to the multimodal texts with IST and TSI relations. Furthermore, during the eye-tracking experiment, second language learners’ (N=9) gaze patterns were examined while reading IST and TSI multimodal texts. The initial study results reveal that while the semantic gap between the image and text encourages more intermodal interactions and longer eye fixations, redundancy, involving the duplication of information via image and text, also develops a strong mental model of the meaning. The present research may contribute to the development of a more comprehensive model of L2 multimodal and multimedia learning.

Leveraging Eye Tracking in Digital Classrooms: A Step Towards Multimodal Model for Learning Assistance

Instructors who teach digital literacy skills are increasingly faced with the challenges that come with larger student populations and online courses. We asked an educator how we could support student learning and better assist instructors both online and in the classroom. To address these challenges, we discuss how behavioral signals collected from eye tracking and mouse tracking can be combined to offer predictions of student performance. In our preliminary study, participants completed two image masking tasks in Adobe Photoshop based on real college-level course content. We then trained a machine learning model to predict student performance in each task based on data from other students, as a step towards offering automated student assistance and feedback to instructors. We reflect on the challenges and scalability issues to deploying such a system in-the-wild, and present some guidelines for future work.

Gaze-Based Monitoring in the Classroom

In the roles of lecturers we have to present certain topics to students by means of Powerpoint slides, videos, animations, and the like. The content itself but also the presentation speed play crucial roles to make a lecture understandable for the students. Since each student has a different experience level, personal mood, or might be distracted by other scenarios the lecturers cannot keep track of all the individual students in a course but has to present the content more or less independently of the audience. In this paper we introduce work-in-progress that focuses on monitoring the students’ behavior in a course including the understanding of the presentation in terms of presentation speed. To reach this goal we make use of mobile eye tracking devices that permanently track the eye movements of the students paying visual attention to the lecturer’s slides. In the current state of the developed system we record two options, whether an individual student could follow and understand the slides or not while the summed up students’ feedback is reflected and displayed to the teacher in real-time, providing some kind of overview and avoiding asking questions permanently about the understanding of the slides’ contents. Finally, we will discuss challenges and limitations of our gaze-based monitoring system.

Investigating Cognitive Load for Tasks with Mathematics and Chemistry Context through Eye Tracking

Changes in mathematical representation are an essential part of many STEM fields, especially chemistry. In order to better understand these transitions between different representations, we investigated possible eye-tracking indicators of cognitive load during a task that had two experimental conditions: one was purely mathematical, the other one was similar, but had a chemical context. We used pupil diameter, fixation duration, and pupil fluctuations as measured through the IPA as indicators of cognitive load. Our preliminary results indicate that there may not be a significant difference in cognitive load between the two conditions, which in turn suggests that a chemical context does not impose additional cognitive load on participants. The present study may serve as a starting point to investigate this issue further and shed light on the potential influence of a chemical context on cognitive load.

SESSION: EMIP

Applying Machine Learning to Gaze Data in Software Development: a Mapping Study

Eye tracking has been used as part of software engineering and computer science research for a long time, and during this time new techniques for machine learning (ML) have emerged. Some of those techniques are applicable to the analysis of eye-tracking data, and to some extent have been applied. However, there is no structured summary available on which ML techniques are used for analysis in different types of eye-tracking research studies.

In this paper, our objective is to summarize the research literature with respect to the application of ML techniques to gaze data in the field of software engineering. To this end, we have conducted a systematic mapping study, where research articles are identified through a search in academic databases and analyzed qualitatively. After identifying 10 relevant articles, we found that the most common software development activity studied so far with eye-tracking and ML is program comprehension, and Support Vector Machines and Decision Trees are the most commonly used ML techniques. We further report on limitations and challenges reported in the literature and opportunities for future work.

GANDER: a Platform for Exploration of Gaze-driven Assistance in Code Review

Gaze-control and gaze-assistance in software development tools have so far been explored in the setting of code editing, but other developer activities like code review could also benefit from this kind of tool support. In this paper, we present GANDER, a platform for user studies on gaze-assisted code review. As a proof of concept, we extend the platform with an assistant that highlights name relationships in the code under review based on gaze behavior, and we perform a user study with 7 participants. While the participants experience the interaction as overwhelming and lacking explicit actions (seen in other similar user studies), the study demonstrates the platform’s capability for mobility, real-time gaze interaction, data logging, replay and analysis.

Visual Expertise in Code Reviews: Using Holistic Models of Image Perception to Analyze and Interpret Eye Movements

This study uses holistic models of image perception to analyze and interpret eye movements during a code review. 23 participants (15 novices and 8 experts) take part in the experiment. The subjects’ task is to review six short code examples in C programming language and identify possible errors. During the experiment, their eye movements are recorded by an SMI 250 REDmobile. Additional data is collected through questionnaires and retrospective interviews. The results implicate that holistic models of image perception provide a suitable theoretical background for the analysis and interpretation of eye movements during code reviews. The assumptions of these models are particularly evident for expert programmers. Their approach can be divided into different phases with characteristic eye movement patterns. It is best described as switching between scans of the code example (global viewing) and the detailed examination of errors (focal viewing).

Analysing the API learning process through the use of eye tracking

We conducted an exploratory study in which participants had to acquire knowledge of unfamiliar application programming interfaces (APIs) in order to complete two programming tasks. Eye tracking was used to monitor participants’ attention throughout the study. We analysed the learning process using the COIL model, which includes three stages: Information Collection, Information Organisation and Solution Testing. Using this model, we describe patterns in the sequences of actions derived from the data. We discuss whether the Solution Testing stage has two distinct functions: constructing the solution and gathering information simultaneously. This would be an exploratory Solution Testing Stage. In addition, the use of eye tracking provided further insight into the API learning process.

Is Clustering Novice Programmers Possible? Investigating Scanpath Trend Analysis in Programming Tasks

The studies on program comprehension using eye-tracking technology have not largely used Scanpath Trend Analysis (STA) to generate common scanpaths for a group of specific expertise in programming comprehension studies. It is important to understand the applicability of STA to help educators distinguish the reading orders of individuals as they solve programming tasks to develop better educational materials and improve instruction. In this research work, we conducted an experiment using common fundamental programming questions on 66 undergraduate computer science students to study the gaze behavior among the novices (high and low performing) on programming comprehension. We aim to better understand the navigation behavior between groups of high and low performers’ common scanpaths generated by STA and whether Hierarchical Cluster Analaysis (HCA) can cluster these common scanpaths for high- and low-performing individuals across different stimuli. Findings suggest that the STA algorithm is a technique to consider to find common representative scanpaths of a group of individuals, however, HCA with relative Levenshtein distance metric alone may not be suitable to cluster high and low performers groups for varying numbers of AOIs across different stimuli.

Program Code Navigation Model for Individuals based on LSTM with Co-clustering

To improve program learning tools with eye tracking technology, it is crucial to understand fixation points and individual information, providing personalized navigation cues for different levels of expertise programmers. Meanwhile, long short-term memory (LSTM) and clustering techniques revealed important characteristics for eye movement data regarding comprehension performance. This paper is about a spatial analysis by co-clustering with the gaze location among different levels of expertise programmers. Then it predicts the next fixation point based on the human eye movement data by LSTM. Finally, combining the individual background information and the eye movement information, it generates a new indicator, named ‘code comprehension index’, to indicate the current code understanding level on real-time gaze information for individuals, which can be applied to improve the program navigation tools’ effectiveness and efficiency for different levels of expertise programmers.

SESSION: COGAIN 2023

Modelling Attention Levels with Ocular Responses in a Speech-in-Noise Recall Task

We applied state-space modelling technique to estimate the cognitive workload of a speech-in-noise (SIN) recall task, based on participants’ oculo-motor responses to speech signals. We estimated common latent attention levels in 15 time bins and observed temporal changes between pupillary dilations and saccade frequencies, given that the both conditions were independent. We also compared two speech type factors (natural vs. synthetic) and three levels of signal-to-noise (-1dB, -3dB, and -5dB) using the estimated parameter distribution. The comparison of experimental factors provided us with insights into differences in participants’ processing of spoken information during a SIN recall task.

Universal Design of Gaze Interactive Applications for People with Special Needs

Within the last 20 years, gaze interaction has become a successful communication solution for numerous people with motor challenges. In this paper, we present two new cases of gaze interactive assistive technology: i) gaze control of an exoskeleton for stroke rehabilitation, and ii) gaze interactive reading support for people with low vision. By applying a Universal Design approach [Mace 1998] both cases are assessed through an ability analysis to identify issues with gaze interaction specific to our applications that need to be further addressed. Finally, we suggest how solutions in our applications may be mainstreamed for a broader user group.

Control prediction based on cumulative gaze dwell time while browsing contents

The utilization of gaze behavior for control has been studied as one of the hands-free control methods. With the recent spread of Head Mounted Display devices, it has become a vital issue to establish hands-free control methods. Previously proposed approaches using the eye gaze behavior have several problems to be solved for natural manipulation during content browsing. Therefore, in this study, considering the case where the gaze point leaves the decision area for a short period, we propose a method that enables accurate control prediction based on gaze behavior during content browsing by counting the cumulative gaze dwell time. Through experimentation, we confirmed that the proposed method would be effective for the accurate prediction of user intention. The limitation of our method and further issues to be addressed were also clarified.

GazeCast: Using Mobile Devices to Allow Gaze-based Interaction on Public Displays

Gaze is promising for natural and spontaneous interaction with public displays, but current gaze-enabled displays require movement-hindering stationary eye trackers or cumbersome head-mounted eye trackers. We propose and evaluate GazeCast – a novel system that leverages users’ handheld mobile devices to allow gaze-based interaction with surrounding displays. In a user study (N = 20), we compared GazeCast to a standard webcam for gaze-based interaction using Pursuits. We found that while selection using GazeCast requires more time and physical demand, participants value GazeCast’s high accuracy and flexible positioning. We conclude by discussing how mobile computing can facilitate the adoption of gaze interaction with pervasive displays.