For exploring sequences of eye movements simulation and decoding approaches can be used, which intersect in cognitive architecture domain. The current research aims to create architectures for generating the internal representation of chaotic data learning. EyeLink 1000 Plus was used; 17 subjects participated in the research (mean age = 21 years, 3 males). The experiment includes three trials: the first is dimensionality reduction, the second is clustering, and the third is a classification problem. Three distinct cognitive styles were identified in addressing the proposed challenges. The proposed association and batch models enable the identification of extreme patterns, which can provide insights into the characteristics of the data. The foundational generative learning algorithm determines the appropriate timing and location for generating new nodes, selects a pattern X, and identifies a subpattern to be encoded by the newly created node, among other decisions. Findings are consistent with the anatomical perspective.
The paper presents a pilot eye-tracking study on how developers choose what issues to work on and how they perform code-reviewing tasks within the GitHub ecosystem. In this study, we recorded the eye movements of thirteen developers to understand what they look at on the GitHub interface to make decisions. They completed four tasks namely, ranking a list of open issues to work on, prioritizing pull requests, the likelihood of pull requests being accepted, and finally evaluating 25 diverse user profiles for pull request acceptance likelihood. Results suggest that the title, description, and labels are the most important information when developers choose the issue to work on and pull requests to review. The quality of the description and reproduction steps also influenced how the developer ranked an issue. The contribution heat map and repository language were relevant areas that attracted more attention when they looked at user profiles.
Smooth pursuits are natural eye movements that occur when we track a moving target with our gaze. While they were explored as a gaze-based input method using external screens to display moving stimuli, we propose BodyPursuits, a novel HCI method that eliminates additional screens. Stimuli are generated by users tracing smooth trajectories with their hand in mid-air while fixating their gaze on their thumb. We conducted a user study to collect eye-tracking and baseline IMU data from 20 participants performing 10 BodyPursuits gestures. Based on 1800 samples and noise data, we train a TinyHAR classifier. It achieves a macro-average F1 score of 0.772. With UEQ results indicating a positive user experience and RTLX scores showcasing low subjective workload for all 10 gestures, we successfully demonstrated BodyPursuits’ potential as viable interaction method. We envision BodyPursuits could be integrated into EOG earphones to detect mid-air hand gestures without external screens or cameras.
During visual search, individuals’ attention shifts between ambient and focal states in response to task demands and stimuli. The ambient/focal coefficient K is a statistically validated measure of these states, computed offline from fixation duration and saccade amplitude data. While current methods compute K offline, real-time computation could enable applications such as monitoring user attention, creating attention-adaptive user interfaces, and optimizing graphics rendering. However, real-time computation of K requires stable estimates for the parameters of fixation duration and saccade amplitude distributions. Since these distributions are heavy-tailed, the real-time estimates exhibit high variance and slow convergence. To overcome this, we propose a robust parametrization and an alternative estimation method, along with two real-time measures analogous to K. Through a map viewing study involving localization and route planning tasks, we show that our proposed measures exhibit dynamics consistent with offline K.
Baseline pupil size and the pupillary light reflex (PLR) have been associated with aging and cardiovascular risk. Therefore it might be possible to develop a cardiac risk classifier for screening based purely on pupillometric and psychometric measurements. The task evoked pupillary response (TEPR) and working memory span might also be predictive, because both change with age. We designed a protocol, combining PLR measurement with a novel adaptive auditory working memory test, aiming to develop an indicator of cardiovascular fitness. Efficiency constraints required precise measurements to be possible in applied hospital settings in roughly 10 minutes. We compared groups of healthy younger and older adults (N=74; 22 vs. 75 years, respectively). Older adults had a smaller baseline pupil size, reduced PLR, and a larger TEPR than younger adults. Based on these features we built an age classifier. We will evaluate its potential for cardiovascular risk classification using patient data.
This study examined the effects of expertise on gaze behavior during a goal-directed Multiple Object Tracking (MOT) task. Specifically, we investigated differences between two primary MOT gaze strategies: target-looking and center-looking. The popular Esport title Rocket League is a competitive video game that requires players to track multiple moving objects while maintaining awareness of critical game information. We used Rocket League as a complex, dynamic and goal-directed analog for traditional MOT tasks, and recruited an expert population of players (ranked above Diamond level, representing the top 10% of players globally) and novice counterparts (players with minimal or no Rocket League experience) to play three levels of the game while tracking gaze. Leveraging a You Only Look Once (YOLO) object detection model, we found that expert players utilized a target-looking gaze strategy more frequently than their novice counterparts, who predominantly adopted a center-looking strategy. These findings align with previous research on traditional MOT tasks and highlight the potential of this novel methodology to expand our understanding of the effects of developing expertise on the use of gaze strategies in dynamic, real-world environments.
Eye movement prediction is a promising area of research with the potential to improve the performance and user experience of systems based on eye-tracking technology. In this study, we analyze individual differences in gaze prediction performance. We use three fundamentally different models: the lightweight Long Short-Term Memory Network (LSTM), the transformer-based network for multivariate time-series representation learning (TST), and the Oculomotor Plant Mathematical Model wrapped in the Kalman Filter framework (OPKF). Each solution was assessed on different eye movement types. We show an important subject-to-subject variation for all models and eye movement types. We found that signal noise during fixation is associated with poorer gaze prediction. High saccade velocity is associated with poorer gaze prediction. These individual differences are important, and it is proposed that future research should report statistics related to inter-subject variation. We also propose that future models should be designed to reduce subject-to-subject variation.
We explore whether SAM2, a vision foundation model, can be used for accurate localization of eye image features that are used in lab-based eye tracking: corneal reflections (CRs), the pupil, and the iris. We prompted SAM2 via a typical hand annotation process that consisted of clicking on the pupil, CR, iris and sclera for only one image per participant. SAM2 was found to support better spatial precision in the resulting gaze signals for the pupil (> 44% lower RMS-S2S), but not the CR and iris, than traditional image-processing methods or two state-of-the-art deep-learning tools. Providing more frames with prompts to initialize SAM2 did not improve performance. We conclude that SAM2’s powerful zero-shot segmentation capabilities provide an interesting new avenue to explore in high-resolution lab-based eye tracking. We provide our adaptation of SAM2’s codebase that allows segmenting videos of arbitrary duration and prepending arbitrary prompting frames.
We present CoLAGaze, the first broad-coverage eye-tracking-while-reading corpus on grammatical and ungrammatical sentences sourced from CoLA — a Natural Language Processing (NLP) benchmark for evaluating the grammatical knowledge of language models (LMs). CoLAGaze provides eye-tracking data from native English speakers in different formats including the raw eye-tracking signal, gaze event data, and reading measures computed at the character, word, and sentence levels alongside comprehensive meta-data and data quality documentation. CoLAGaze enables psycholinguistic research on the processing of diverse (un)grammatical structures, allows the training of generative models of eye-movements-in-reading capable of generalizing to ungrammatical stimuli, facilitates the alignment of LMs to human language processing, and supports gaze-augmented NLP applications for grammatical error detection. CoLAGaze and the preprocessing code, is available at OSF and GitHub. We have also integrated it into the pymovements Python package.
Eye tracking in virtual reality (VR) can improve realism and immersion, for example, with gaze-contingent depth-of-field simulations. For this application, knowing the distance of the fixated object, not just the gaze direction, is crucial. One common approach estimates gaze distance from vergence, the relative angle between the eyes, but the accuracy of this method is limited, particularly for larger distances. Alternatively, the gaze distance in VR can be retrieved directly from the depth map at the point of estimated gaze. However, eye tracking inaccuracies may result in the measured gaze being directed at an incorrect object, leading to a wrong distance estimation. This issue can occur particularly when fixating on small targets or edges of objects. To address this, we introduce a CNN-based method, which combines depth map data with vergence information from eye tracking. Our model successfully learns to combine information from both features and outperforms state-of-the-art methods.
Measuring pupil diameter is vital for gaining insights into physiological and psychological states — traditionally captured by expensive, specialized equipment like Tobii eye-trackers and Pupillabs glasses. This paper presents a novel application that enables pupil diameter estimation using standard webcams, making the process accessible in everyday environments without specialized equipment. Our app estimates pupil diameters from videos and offers detailed analysis, including class activation maps, graphs of predicted left and right pupil diameters, and eye aspect ratios during blinks. This tool expands the accessibility of pupil diameter measurement, particularly in everyday settings, benefiting fields like human behavior research and healthcare. Additionally, we present a new open source dataset for pupil diameter estimation using webcam images containing cropped eye images and corresponding pupil diameter measurements.
Gaze + pinch interaction—where gaze serves to point, and a hand action triggers selection—is widely adopted in commercial devices. However, target selection failures caused by gaze-hand coordination errors limit its effectiveness. We examine how task complexity impacts gaze-hand coordination errors and propose an algorithm to mitigate misalignments in input between these modalities. Specifically, we studied tasks with varying visual (perceptually cued targets versus search) and manual (thumb-index pinch vs multi-finger pinch) complexity. We find that late finger touches account for 86.57% of the errors. Furthermore, increased manual complexity is associated with elevated error rates. Based on these insights, we developed a classifier capable of detecting late-triggered errors with a mean accuracy of 97.31% (SD 0.18). By defining the gaze point as the most temporally proximate target fixation before a finger tap, our algorithm corrects the majority (94.61%) of eye-hand input alignment errors, thereby improving gaze-based interactions on HMDs.
Unmanned surface vessels (USVs) and other maritime vehicles are often operated using a keyboard and mouse combination or joystick-style controller, and are commonly used by those within the ocean, environmental, and biological sciences. These controllers keep the user’s hands occupied and require development of specific motor skills for vehicle control. Furthermore, these approaches are unusable for those with upper-limb mobility challenges, which presents a barrier to participation in these fields. Eye tracking has enabled users to control a device, application, vehicle, or robot using their eye gaze for hands free operation. Eye gaze control of vehicles and robots has focused primarily on either a single method per platform or multiple methods for a single platform. The bulk of eye gaze control research focuses on air and ground vehicles rather than maritime vehicles. With this background in mind, the emphasis of this work is the development of a platform agnostic, eye gaze control method for maritime vehicles. Past work introduced and tested this eye gaze control method with an underwater remotely operated vehicle (ROV), therefore this paper demonstrates the transferability of this method to a USV. The same script used for the gaze-controlled ROV was modified to enable control of the USV by directions of ‘forward,’ ‘backwards,’ ‘left yaw,’ ‘right yaw,’ and ‘neutral yaw,’ as well as the ability to arm/disarm the vessel. In-water, laboratory testing shows successful eye gaze control of the USV, serving as an effective proof-of-concept.
Gaze may enhance the robustness of lie detectors but remains under-studied. This study evaluated the efficacy of AI models (using fixations, saccades, blinks, and pupil size) for detecting deception in Concealed Information Tests across two datasets. The first, collected with Eyelink 1000, contains gaze data from a computerized experiment where 87 participants revealed, concealed, or faked the value of a previously selected card. The second, collected with Pupil Neon, involved 36 participants performing a similar task but facing an experimenter. XGBoost achieved accuracies up to 74% in a binary classification task (Revealing vs. Concealing) and 49% in a more challenging three-classification task (Revealing vs. Concealing vs. Faking). Feature analysis identified saccade number, duration, amplitude, and maximum pupil size as the most important for deception prediction. These results demonstrate the feasibility of using gaze and AI to enhance lie detectors and encourage future research that may improve on this.
Amidst the multimodality of biomedical data and the escalating prevalence of depression, multimodal biosignal fusion stands out as a crucial trend in depression research. Our study introduces a novel Multimodal Self-Attention Network (MSNet), which integrates the multi-head attention mechanism simultaneously to eye tracking and EEG data, enhancing depression detection through long-range dependencies modeling ability. MSNet automatically extracts attention distribution from eye tracking data, utilizing multi-head attention mechanism with location coding to capture attention bias globally. For EEG data, MSNet incorporates residual structures into graph attention network to transmit neighbor aggregation features through multi-head attention mechanism, facilitating dynamic weighted distribution between nodes and neighbors to enhance feature consistency. With a classification accuracy of 93.62%, MSNet showed a significant enhancement compared to using singular eye tracking or EEG data, showcasing the benefits of combining both modalities in depression detection and the effectiveness of the self-attention mechanism in processing physiological signals.
We present a real-time gaze-based interaction simulation methodology using an offline dataset to evaluate the eye-tracking signal quality. This study employs three fundamental eye-movement classification algorithms to identify physiological fixations from the eye-tracking data. We introduce the Rank-1 fixation selection approach to identify the most stable fixation period nearest to a target, referred to as the trigger-event. Our evaluation explores how varying constraints impact the definition of trigger-events and evaluates the eye-tracking signal quality of defined trigger-events. Results show that while the dispersion threshold-based algorithm identifies trigger-events more accurately, the Kalman filter-based classification algorithm performs better in eye-tracking signal quality, as demonstrated through a user-centric quality assessment using user- and error-percentile tiers. Despite median user-level performance showing minor differences across algorithms, significant variability in signal quality across participants highlights the importance of algorithm selection to ensure system reliability.
Gender differences in visual attention have been well documented across various perceptual and cognitive tasks, yet little is known about how these differences manifest in real-world contexts like aviation. This gap is critical given ongoing gender disparities in the aviation industry and the need to ensure equitable training and performance assessment practices. This study investigated gender differences in gaze behavior, flight performance, and self-reported situation awareness (SA) in low-time pilots (<300 flight hours) using a high-fidelity flight simulator and eye-tracking glasses. Twenty pilots (10 female, 10 male) completed nine landing trials, including an emergency scenario. Results showed that females demonstrated more stable landing approaches, completed tasks faster in the emergency scenario, and had higher SA ratings, though gaze metrics showed no significant gender differences. These preliminary findings suggest that female pilots may manage task demands effectively under pressure and have important implications for addressing gender-based assumptions in training and recruitment.
Detecting alcohol inebriation using tracking of eye movements has potential applications in enhancing operational safety and enabling health-monitoring solutions. For wide practical adoption, eye movements need to be tracked without actively inconveniencing the user. We therefore study whether alcohol consumption can be detected using wearable and remote eye tracking. We collect eye-movement data of participants in sober and moderately inebriated states. We develop a machine learning approach that uses features derived from saccadic and fixational eye movements, pupil diameter, and eye closure. We find that while alcohol can be detected with moderate accuracy (ROCAUC > 0.65) in a closed-population setting, the attainable detection accuracy is just above random guessing (ROCAUC = 0.569) for unknown users. The attainable accuracy is higher for blood alcohol concentrations of 0.04%. Our findings highlight that strong individual differences in the response to alcohol render generalization across individuals challenging. Our code and data are available online1; participants have given their informed consent and the study has been approved by the ethics committee of our university.
As eye-tracking-while-reading data is known to reflect cognitive processes involved in reading comprehension, it is widely used to develop and evaluate psycholinguistic theories. However, these theories often do not specify which cognitive process is reflected in which gaze feature. Similarly, machine learning models that use gaze features for reader inference, such as assessing their reading proficiency, are often hard to interpret. We address this problem by adapting Neural Additive Models (NAMs), an innovative method designed for directly interpreting each feature’s contribution to the model’s prediction; and by applying NAMs to three concrete tasks, namely determining a reader’s level of domain expertise, assessing their background knowledge, and assessing their text comprehension1. Our analyses provide insights into which features are important predictors for each task. We find that features reflecting the variability in the gaze patterns combined with lexical and structural text features are strong predictors.
Although eye-tracking offers valuable insights into reading processes, its use for training machine learning models is still limited due to scalability issues, as data collection remains resource-intensive. To tackle this challenge, we build on insights from psycholinguistic and psychological reading research that demonstrated the efficacy of proxy measures, specifically mouse-tracking and self-paced reading, as viable substitutes for high-quality eye movement data. We leverage the state-of-the-art model BEyeLSTM to infer general reading and text comprehension and investigate the effectiveness of pre-training it on more accessible proxy data before fine-tuning it on a smaller eye-tracking dataset. Our findings indicate that pre-training on proxy data (i) reduces the amount of eye-tracking data needed to maintain performance levels and (ii) enhances performance when the full eye-tracking dataset is used for fine-tuning. These results not only suggest a promising path toward making eye-tracking-based reading assessments more scalable, cost-effective, and accessible, but also underscore the broader potential of using proxy data for pre-training as a solution to scalability challenges in developing machine learning methods for eye-tracking data. Our code is publicly available: https://github.com/aeye-lab/transfer-learning-from-proxy.
This paper presents a scanpath aggregation method, enhanced with second-order metrics, applied to mobile eye-tracking data. We showcased our approach in an experimental study on the impact of Audio Description (AD) on visual attention toward architectural heritage. During city walks, participants (n = 42) observed architectural landmarks while their eye movements were recorded. We aggregated participants’ data for each of the artifacts and calculated second-order metrics. The statistical analyses indicated that AD significantly enhanced focal attention but did not affect the overall entropy of the scanpaths. Over time, the presence of AD led to a decrease in the length of aggregated scanpaths. The study emphasizes the efficacy of second-order metrics applied to aggregated scanpaths in capturing attentional patterns. We highlight the potential of AD in directing structured visual attention while facilitating natural exploration, thereby emphasizing the role of advanced gaze metrics in enhancing the accessibility of cultural heritage.
The uncanny valley phenomenon, which explains the discomfort people feel when encountering objects that seem almost human, has become more relevant with the increasing exposure to realistic 3D objects. This increase highlights the necessity to understand human perceptions when interacting with human-like characters. An experimental study was conducted using an eye tracker to explore this phenomenon through qualitative and quantitative data collection in two rounds. The first round (exposure) examined participants’ eye-gaze behavior when viewing 14 images of either a male or female portrait. The second round (assessment) explored their perception of the same images. The results showed that the comfort level of the participants decreased simultaneously with the decrease of the humanness present in the portraits, but the phenomenon did not occur. There was a difference in pupil dilation and eye fixation based on the image's gender. Participants mostly focused on the eyes and nose when assessing the humanness of an image, suggesting that irregularities or distortions in these facial features may contribute to the phenomenon.
A novel second-order entropy-based oculometric, mobile transition matrix entropy (MTME), is presented for use in mobile eye tracking. Our approach calculates the MTME on the spheric gaze transition matrix, which is based on a sphere around each individual divided into equal-size plates (areas-of-interest). This approach allows for the correction of gaze location by head and body movements. Additionally, we demonstrate that the present approach is suitable for the calculation of eye movement entropy in real-time which opens the possibilities for use in human-computer interaction based on users’ attention characteristics. We provide empirical evidence from a collaborative task focused on climate change mitigation, demonstrating that MTME effectively captures collaboration dynamics by differentiating between interaction phases and video viewing. The paper presents the method’s details and discusses the metric’s sensitivity and analytical utility.
Eye tracking provides a marker of attention. In the educational context, such behavior can be harnessed to understand learning behaviors. However, a technology framework that captures and utilizes such multimodal indicators in educational activities is lacking. This paper presents LA-ReflecT, a platform integrating multimodal data for micro-learning activities. Teachers can author learning tasks and enable tracking eye fixation behaviors. A web camera-based eye-tracking function captures the gaze data while attempting the learning task. Learners can control the settings to stop or pause recording. We present data-driven services such as visualizing gaze attention heatmap and genetic algorithm-based group formation. A classroom study with 41 students illustrates using the proposed framework in an authentic context. Data collected is analyzed to answer an initial research question regarding the correlation between the heterogeneity of the click and gaze patterns in a learning task. The work is open for a demo.
Torsion is the rotation of the eye along its visual axis. Under normal conditions, this happens when a person rotates her or his head, for example. Stroke, particularly in areas of the brain stem or the cerebellum, can be a significant cause of eye torsion. Therefore, eye torsion can be a highly significant indicator for a stroke, which is especially important if a stroke is not recognized by the person itself, like a stroke during the sleeping time. Strokes that are not recognized by a person are usually called silent strokes. Those silent strokes can be the harbinger of another obvious stroke, dementia or gait disorder. We propose a deep learning model for eye torsion estimation which runs on a Raspberry Pi 5. This can help in the development of supportive diagnosis systems for patients or an early warning system to prevent serious damage from silent strokes.
Gaze-based human-computer interaction (HCI) via the internet includes temporal delays with unpredictable "jitter," degrading usability including the sense of agency – the feeling of control over one's actions and subsequent outcomes. Using a sense of agency experiment paradigm, we present a novel method to test jitter's effect on user sensation. Our methodology simulates internet-based delays, obtaining data to predict a critical parameter and delay to degrade user sensation. The task was a letter visual search with biresolution gaze-contingent window (high-resolution foveal, blurred peripheral), manipulating consistent and jittery delays between eye movements and display updates. This paradigm effectively manipulated participants’ sense of agency, eliciting systematic bias in authorship responses by manipulating the minimum jittery delay and jitter variability. This suggests that the proposed method provides a powerful tool for understanding the interplay of delays, eye and other body movements, and the sense of agency, with/without neuroimaging, with internet-originated delays in HCI.
Forecasting of fixation targets is an upcoming field in eye tracking research. It allows us to predict the next fixation of a human and has a plethora of possible applications. The main interest in forecasting fixations is to gain insights into cognitive processes since it reveals the expertise in a task based on the sequence of fixations as well as visual strategies of the human. Forecasting of fixations could also improve the interaction between humans and computers, since the computer can more reliably estimate what the user desires. In the context of advertisements on webpages it enables the possibility to optimize the presentation for humans. Especially the placement of advertisements. In this work we propose a forecasting approach for video platforms like YouTube. The approach is based on machine learning algorithms, the normalization of the gaze sequence, and the usage of tiles as classification target.
Cognitive flexibility is crucial for adapting behavior to shifting demands, particularly in tasks with varying cognitive requirements. This study examines the relationship between cognitive flexibility levels and eye movement parameters across six difficulty levels. Results show that highly flexible individuals consistently outperform those with lower flexibility, demonstrating more fixations, shorter fixation durations, and reduced blinking. Notable differences in eye movement patterns become apparent at the highest difficulty levels (5-6), emphasizing the importance of switching abilities in complex cognitive tasks. These findings offer new perspectives on psychophysiological measurements of higher-order cognition, with significant implications for cognitive theory.
In this study, we developed a hybrid eye-brain-computer interface (EBCI) combining gaze-based interaction with control via motor imagery (imagined movements, IM) and quasi-movements (QM). We tested it in a custom game. Both types of covert movements were detected through electroencephalography. Previous studies suggest that QM are more detectable than IM; here, they were used for online control for the first time. We aimed to reduce demands on gaze by minimizing its intentional use. The interaction enabled visual exploration without unnatural fixations for command input. We hypothesized that QM would combine with gaze better than IM, but this was not supported. Covert movements were detected with >80% accuracy. User experience questionnaires indicated that the interface was internally consistent and easy to use, at least with IM, while QM-based control was perceived as effortful. These results support our approach to gaze-based interaction, while the integration of QM in EBCIs requires further studies.
Early detection of Alzheimer’s disease (AD) is crucial for effective intervention. Recent studies suggested that eye movements could serve as biomarkers for AD. This study analyzed eye movement data from visuospatial memory tasks and proposed a deep learning-based approach using Convolutional Neural Networks (CNNs) to classify AD patients based on gaze heatmaps. Instead of manual feature extraction, we used a three-channel input: the heatmap from viewing the original image, the heatmap from the manipulated image, and a region-highlighted image. This allowed the CNN to learn relevant patterns automatically. The model was trained on 24 participants (11 AD, 11 non-AD, 2 excluded) and achieved an AUC of 0.79, demonstrating moderate predictive performance.
In this paper, we first describe how emotion classification is currently incorporated into human-computer interaction. We then describe classification methods, including the eye-behavior-based classification method. Next, we show how to design an implicit eye-behavior-based emotion classification system and its potential applications.
This study examined how the emotional dimensions of arousal and valence influence effort-based decision-making. Twenty-eight participants were exposed to either low arousal & high valence (-A/+V) or high arousal & low valence (+A/-V) stimuli in a task that required choosing between a more rewarding but also more effortful option versus a less effortful but also less rewarding one. We recorded eye movements, fixation times, and self-reported arousal and valence ratings. Results showed that participants in the +A/-V condition were more likely to select the effortful option (62% vs. 56%), with a significant interaction between arousal and valence (β = 0.46, 95% CI [0.26, 0.67]; p <.001). Specifically, positive valence reduced preference for effortful choices, but this effect became null under high arousal. Additionally, participants in the -A/+V condition took longer to make decisions and attended more to reward information than to effort information. These findings suggest complex emotional influences on effort-based decisions.
Microsurgical suturing requires advanced skills, and gaze behavior plays a crucial role in skill acquisition. This study explores the relationship between gaze behavior and body ownership by implementing a VR-based training application with a 3D puzzle task. To encourage appropriate gaze behavior, we introduced a planar overlay shield that appears when the user’s gaze deviates from the central task area. A user study evaluated the effects of gaze control during and after training. The results showed that gaze control effectively directed gaze toward the central task in training sessions. However, it did not significantly improve gaze behavior post-training. These findings suggest that while gaze guidance influences gaze direction, it may not directly facilitate gaze behavior acquisition. Future work will explore alternative gaze guidance techniques and their impact on body ownership to develop a more effective training system for microsurgical suturing.
The text reading ability of English as a second language speakers is predicted using features of eye movements in comparison with those of native speakers. In order to estimate the level of language skill development, normalised distance metrics of word skipping patterns during text reading by the learners and native speakers were measured. The distances between the two groups of readers closed gradually, according to the scores of English proficiency tests of the three ability levels the learners were separated into. The individual distances also correlate with the English test scores.
This study investigates the feasibility of developing a user authentication system based on individual differences in gaze patterns. We used eye-tracking data from 10 participants who viewed 15 common images extracted from Ueyes dataset for recurrence plot analysis. The analysis shows that visual search strategies differ between participants. Our results suggest that features obtained through recurrence plots can form the basis for developing a more secure and user-friendly authentication system.
In activities that rely on visual information, learning the gaze trajectory of an advanced user is important for gathering information about the surrounding environment. In addition, it is known that the eye movements of advanced users are more intense than those of ordinary people, so it would be difficult to imitate their gaze perfectly. However, conventional methods aim to mimic the user’s gaze as it is, which does not always result in efficient gaze imitation. In this paper, we propose a method for determining the tempo for each user and gaze task sequence, and a rhythm game gaze system that improves the efficiency of gaze movement. We show that the rhythm used in our method improves the efficiency of performing the gaze task. This system is expected to allow us to experience the advanced user’s gaze and to apply it to training.
Transcribing text and figures from blackboards and textbooks by hand is common practice in Japanese schools. This activity requires visual cognition skills, such as eye movements and eye–hand coordination. This study aims to clarify how these skills function in transcription. We employed an eye tracker and the Rey–Osterrieth Complex Figure Test (RCFT). Participants completed two transcription tasks: one with the RCFT placed at a distance and another side-by-side. We evaluated four visual cognition skills: saccade control, visual search, short-term memory, and eye–hand coordination. Results show that transcribing the RCFT from a distance depends primarily on saccade control and visual search, whereas side-by-side transcription relies more on short-term memory. These findings suggest that transcription tasks demand both eye movements and memory, highlighting the importance of developing these skills through educational programs.
In this paper, a few-shot adaptive gaze estimation method is proposed for end-to-end personalized gaze direction prediction in the natural driving environment. A gaze zone estimation module is used to generate the images and gaze directions from the left mirror, right mirror, rearview mirror, and frontal windshield for calibration. A pre-trained gaze model is utilized for gaze feature extraction of calibration images and target images. A direction classification module is employed to infer the probability of the direction distribution of the target points with respect to the calibration points. Experimental results demonstrate that the proposed method outperforms the SOTA methods on IVGaze, achieving an improvement of \( 5.8\% \) over the existing techniques and \( 40.3\% \) compared to the baseline model.
The growing availability of consumer-grade devices equipped with eye-tracking optics, including Augmented/virtual reality (AR/VR) headsets, has brought eye-tracking technology into wider use than ever before. Understanding the focus and visual scanning behavior of users can help optimize users’ engagement in immersive environments. In this study, we present a framework for measuring visual attention in a gaze-driven VR learning environment using a consumer-grade Meta Quest Pro VR headset. This system generates and presents basic and advanced eye-tracking measures such as fixation duration, saccade amplitude, and ambient/focal attention coefficient \( \mathcal {K} \) as indicators of visual attention in the VR environment.
This pilot study investigates the processing of negation in Polish and Ukrainian in the Visual World Experiment with a Blank Screen Paradigm. Polish and Ukrainian differ in the composition of meaning in contexts with negation and perfective aspect which may influence whether listeners mentally simulate illusory scenarios when processing negative statements. Eye-tracking data revealed a preference for factual interpretations over illusory ones during both anticipatory and integrative phases both in Polish and Ukrainian, supporting the one-step model of negation.
Eye tracking is an important technology for studying human behavior, cognition, measuring attention for autonomous driving, and many more research as well as application areas. Capturing high-quality eye tracking data is, however, a challenging task. There are many challenges in the real world like reflection on eyeglasses, make-up, nearly closed eyes, eye tracker shifts or highly off-axial angles of the camera. This means that large quantities of the data acquired during an eye tracking study cannot be used due to an invalid gaze signal or no gaze signal at all, since the pupil detection failed. We propose therefore an approach based on neural radiance fields (NERF), which allows the reconstruction of an eye tracking recording as well as the generation of novel eye tracking data. This can help researchers and application developers in their work by repairing bad recordings or generating data for machine learning models.
In this paper, we investigate the relationship between real-time nocturnal blood glucose values received via a continuous glucose Monitoring device and eye movements using an electrooculogram sensor during hypoglycemic events. Our proof-of-concept analysis reveals distinct differences in saccade velocity patterns between hypoglycemic (Blood Glucose < 70 mg/dL) and normoglycemic (Blood Glucose ≥ 70 mg/dL) states. Specifically, hypoglycemic episodes exhibit lower saccade velocities with reduced variability, whereas normoglycemic episodes show greater fluctuations and occasional high-velocity events. These findings suggest that eye movement behavior may be influenced by nocturnal glucose fluctuations, potentially offering a non-invasive methodology for detecting hypoglycemic episodes.
This paper focuses on estimating visual cognition—the cognitive process involved in visually recognizing specific objects—by analyzing participants’ gaze behaviors at the moment they recognize a target. The experiment investigates the effects of visual cognition difficulty (VCD) and target movement direction on recognition timing and minimum angular distance. The results show that participants recognized targets with lower VCD (larger fonts) and targets moving outward more quickly.
To tackle the cross-domain model performance degradation challenge, a gaze estimation method based on proxy tuning is proposed, called PTGaze. In PTGaze, an base model is used to learn the gaze representation of the baseline model in the source domain, and an adapt model is used to learn the gaze representation of the baseline model in the target domain. The gaze difference between the base and adapt models is utilized to guide the final output, ensuring the method’s accuracy in the target domain. Experimental results show that the proposed method achieves higher cross-domain gaze estimation accuracy on five public datasets, using RT-Gene, Full-face, and Gaze360 as baseline models.
During gaze-based interaction, gaze provides both control and visual input. Although in simple tasks, like eye typing, these functions are separated, more complex scenarios can lead to misinterpretation of user intent. In our study, we explored if machine learning (ML) can aid in solving this problem. 15 participants played a gaze-controlled game, where they could freely select screen objects with a 500 ms dwell time. By applying ML to gaze features and contextual information, we achieved a threefold reduction in false positives. This study is the first to show how ML can enhance gaze-based interaction in visually demanding environments.
Eye movement research is often obstructed by the time-consuming challenges of accessing and preprocessing datasets, diverting efforts from scientific discovery. Researchers often struggle with non-standardized data formats, incomplete metadata, and scattered dataset repositories. Moreover, visibility of tediously collected and curated datasets is hindered without central aggregators that reference these valuable works. pymovements addresses these shortcomings by providing researchers with a seamless way to announce, discover, download, and process published eye movement datasets. We encourage researchers to contribute their eye-tracking datasets to our library to increase their visibility and impact.
Dyslexia affects reading fluency and comprehension, requiring early and accessible detection. Traditional detection methods are costly and time-intensive, limiting scalability. We propose a novel dyslexia detection framework that encodes eye-tracking data into a structured time-series representation, preserving temporal dependencies while compressing high-dimensional gaze behavior into an interpretable format. This enables AI models to identify key reading patterns, such as fixations, regressions, and erratic jumps, without relying on computationally expensive image-based methods. Our approach achieves an F1-score of 0.862, demonstrating competitive performance compared to deep learning models while enhancing interpretability.
Various studies have been approached in remote eye gaze monitoring technologies to estimate drivers’ eye gaze with high user-friendliness. However, current methods require a tedious calibration process or specific hardware and have poor performance on uncalibrated drivers. In this paper, an implicit gaze calibration method is presented for in-vehicle driver gaze prediction systems under real-world driving scenarios. Considering the topological position and head motion characteristics, it locates the representative gaze zones by clustering, and progressively selects the head dynamics between the target coarse gaze zones. Benefiting from the linear interpolation, the gaze points of the candidate head dynamics are refined to their vertical projection trajectory. Both the representative gaze zones and clustered gaze points on the head dynamics can be utilized as calibration points. Extensive experiment results demonstrate that the proposed method is effective and comparable with manual gaze calibration.
This paper introduces UnityEyes 2, an open source and customizable synthetic image generator for training machine learning methods for eye tracking. Training models with synthesized images and ground truth is far more convenient than training from real images, which require manual annotation, but if viewing angles, distances, and lighting conditions do not match real conditions, model performance will degrade (the sim-to-real gap problem). UnityEyes 2 supports customization of distributions of eye pose, camera intrinsic and extrinsic parameters, multi-camera setups, and varied lighting conditions. A graphical user interface enables rapid prototyping of the data distribution and model evaluation. Experiments show that training convolutional neural network models from camera-specific synthetic datasets leads to better transfer to the real world compared to training from generic-viewpoint synthetic datasets.
Radial authentication interfaces offer privacy-preserving, calibration-free eye-movement authentication on smartphones. Shorter passwords with fewer indicators improve speed and accuracy but compromise security, while longer configurations enhance security at the expense of usability. The ideal radial interface that balances these trade-offs remains unknown. Using the iPhone 13, this study examines seven radial authentication interfaces in combination with varied password lengths. We conducted controlled eye-tracking experiments with 27 participants, evaluating authentication accuracy, security, and entry time. Additionally, participants ranked their priorities among these factors to assess optimal authentication design. Our findings show that a configuration of four indicators and a four-digit password balances these trade-offs based on performance and user prioritization.
Eye tracking is applied in very heterogeneous settings, from reading to immersive virtual reality (VR), using devices ranging from stationary high-precision to mobile low-power devices. Classification algorithms, on the other hand, are developed and tested on specific devices in specific settings. A central question is how algorithms that have been optimized for a particular scenario (device + setting) would perform when being applied to a different scenario. In this paper, we approach the idea of a distance metric, describing the similarity between two scenarios. If the distance between two scenarios is low, the algorithms can be safely adopted between them. The higher the distance, the weaker the expected performance of the transferred algorithm. We reflect our ideas on a transfer from a 2D screen-based eye-tracking scenario to an immersive VR scenario.
Cognitive control is important for multitasking performance, which is essential for safety and efficiency in human-machine interaction domains such as aviation. Previous research has shown that the stability-flexibility dilemma of cognitive control can be manipulated through task prioritization in low-fidelity flight environments, with eye-tracking metrics effectively classifying this dilemma in a low-fidelity flight environment using a machine learning approach. However, it is unclear whether these results extend to higher-fidelity environments. This study examines the applicability of this approach to eye-tracking metrics assessed in a VR flight simulator. Linear mixed-effects models reveal significant differences in fixation duration, relative number of fixations, coefficient K, stationary entropy, and explore-exploit ratio across flight scenarios with varying task prioritization. Additionally, these metrics were used as input features in a machine learning model to classify the mission scenarios. The findings have important implications for designing adaptive assistance systems aiming at supporting multitasking performance in safety-critical environments.
Gaze-based interaction provides an intuitive way to control robotic systems, but unintended selections remains a challenge. Conventional approaches mitigate this problem by requiring additional confirmation actions, yet incorrect decisions still occur. In this study, we propose an anomaly detection approach to improve selection accuracy. We conducted a virtual reality experiment with a visual search task and tested different methods to find anomalies in the gaze pattern. These methods are trained with correct selections to learn their features. By exploiting the gaze patterns, the methods effectively discriminate between correct and incorrect selections, since unintended inputs are not well represented in the learned gaze patterns and therefore the features differ from the normal case. The results show that our approach maintains over 90% accuracy for correct selections while successfully identifying over 60% incorrect selections, thereby reducing false activations. To further validate our method, future work should investigate its effectiveness in real-time scenarios.
While adaptive reading interfaces are capable of providing flexible typographical adjustments in real-time, readers are challenged to keep track of the context. This paper aims to contribute by introducing context preservation, which enables readers to resume reading faster after applying typographical adjustments, using eye-tracking. Typography adjustments are applied through so-called interventions, and the reading application currently has four intervention designs: Popup, Undo, Notification, and Gradual. To explore how much text is required to resume reading quickly, context-preservation functionality was applied and evaluated on 22 participants through within-subjects experiment design.
Our findings reveal significant differences in reading-resume time (RRT) between interventions. Furthermore, context-preservation in a gradual intervention mode is the fastest and most liked intervention design by the participants.
Understanding how novice programmers allocate their visual attention during programming tasks can improve education. Although eye-tracking provides rich quantitative data, visualizing gaze behavior effectively remains challenging. This study introduces a methodology that integrates DBSCAN clustering and Sankey diagrams to visualize attentional distribution among novice programmers based on their expertise and accuracy. We collected eye-tracking data from undergraduate students as they solved multiple-choice programming questions. We clustered fixation data to identify Areas of Interest (AoIs) and used bar charts and Sankey diagrams to visualize attentional differences across expertise and individual levels. The visualization results reveal distinct cognitive processing strategies between high- and low-performing students, providing deeper insights into novice comprehension behaviors. These findings support the development of personalized feedback mechanisms in programming education. Future research will expand the dataset, explore varying task complexities, and incorporate real-world programming environments to refine comprehension modeling and adaptive learning strategies.
As large language models (LLMs) become more integrated into software engineering and computer science education, it is crucial to understand their impact on student learning. While recent research has explored student perceptions of generative AI, little is known about how these tools influence students’ cognitive processes during programming tasks, such as code comprehension, a valuable skill in software development and maintenance. This paper presents the design of a study that aims to investigate how computer science students interact with LLMs, such as Google’s Gemini, in the context of code summarization using eye-tracking. This study will examine differences in visual attention, fixation behaviors, and performance of students engaged in code summarization with and without AI assistance across varying experience levels.
The work presents a small pilot study of developers reading Python programs in the iTrace eye-tracking infrastructure. The main objective is to understand how adding new languages to srcML impacts the usage of iTrace. iTrace uses srcML to extract syntactic information from the source code being examined. This allows iTrace to automatically determine regions of interest (ROIs) and drastically reduce the amount of time and effort required to process the collected eye tracking data. The srcML infrastructure has a beta version to support Python and which allows iTrace to fully support analysis of eye tracking studies on Python source code. This work demonstrates the viability of supporting new languages in iTrace.
In the digital era, election maps play a vital role in visualizing complex electoral data and shaping the public’s understanding of democratic outcomes. However, interpreting these maps can be challenging, particularly for non-experts, as comprehension is influenced by both cognitive styles and educational background. This study utilized eye-tracking technology to examine how users engage with interactive election maps. We conducted experiments with 20 high school students, categorizing them as either analytical or holistic thinkers based on post-session assessments. This classification provided insights into how different cognitive styles, along with prior education in geography and statistics, influence the perception and interpretation of election maps. Our findings highlight the value of integrating eye-tracking data into election map design to enhance usability and accessibility for a diverse audience. Moreover, the study underscores the need for stronger educational foundations in cartography and data literacy to improve users’ ability to critically assess election visualizations. By improving cartographic visualization and integrating better educational strategies, we strive to create more inclusive and empowering tools that support informed democratic participation.
Understanding and classifying cognitive workload is a critical challenge in educational technology and human-computer interaction (HCI). While cognitive load is often treated as a single, generalized concept, its nuanced components—such as working memory load and visual attention load—play distinct roles in learning environments. To investigate these differences, we conducted a controlled experiment, collecting a comprehensive eye-tracking dataset comprising 528,017 data points across varied cognitive tasks. Leveraging machine learning, we demonstrate that these cognitive states can be classified, revealing measurable distinctions between load types. Our findings pave the way for adaptive learning systems that dynamically tailor instructional content based on cognitive state assessments. This research contributes to the development of personalized, AI-enhanced educational tools, advancing both theoretical understanding and practical applications of eye-tracking in education, cognitive assessment, and HCI.
Gaze entropy quantifies the randomness in eye movements and serves as a proxy for visual attention and cognitive engagement. This study investigates its relationship with player focus and learning within a pattern matching memory game designed to challenge attention and working memory. Using an iPad-based gaze-tracking system, we analyzed gaze entropy trends over 21 days. The results suggest that frequent players exhibit lower entropy, indicating structured gaze behavior and improved focus, whereas infrequent players display higher entropy linked to erratic attention. Although promising, these findings are currently limited to pattern matching games and may not be generalized to other genres of games. The study highlights the potential of gaze entropy to inform adaptive game mechanics and real-time cognitive feedback.
Qualitative and quantitative eye tracking studies are prominent in many fields to understand behavior related to visual attention, in particular, for visualization research. The design, setup, and execution of a study, as well as the analysis of the acquired eye tracking data, can be difficult. This work proposes guidelines for eye tracking studies in visualization. We differentiate three major phases, focusing on before, during, and after a study. These guidelines are based on our experiences from conducting more than 100 eye tracking studies and additional literature research for each phase. As a result, we present a structured plan and chronological order of general requirements as a checklist for conducting eye tracking studies. A living document of the checklist can be found at: https://github.com/mibu1976/etvis2025
The visual theme of a dashboard, whether light or dark, is a prominent design choice with potential implications for user experience. This research investigates the effect of visual theme on user performance and workload during decision-making tasks on dashboards. In a within-subjects experiment, we measured the effect of dark and light themes and task complexity (easy, medium and hard), on task completion time, accuracy, confidence, fixation counts, pupil dilation, and workload. The dark mode improves accuracy, confidence, and average fixation count for medium task complexity levels, suggesting its utility in specific scenarios. In dark mode, the relative pupil dilation was higher, but the perceived workload was lower than in light mode. These findings highlight the need to study the interrelation between objective workload measurements and subjective questionnaires. This study advances empirical foundation for theme selection in data-driven interfaces of varying complexity.
This study explores the use of mobile eye-tracking technology to evaluate visitor engagement and behavior within the geographic exhibit at the Science Museum of Palacký University Olomouc. Participants of three age groups navigated the exhibition independently, while their gaze data were recorded and analyzed. Results revealed that interactive and visually dominant elements, such as the 3D animal models in the "River Bed" exhibit, garnered the most attention, with visitors spending 47.75% of their observation time on such features. Conversely, poorly positioned elements, like the spider model, attracted minimal attention, highlighting the importance of exhibit accessibility and placement. Textual information accounted for 14.95% of observation time, demonstrating its value when effectively integrated with interactive features. This study underscores the utility of eye-tracking in identifying "hotspots" of attention, optimizing exhibit design, and enhancing visitor experiences. Insights provided to museum staff informed practical adjustments, including relocating underperforming exhibits for better visibility.
Foveated rendering methods usually reduce spatial resolution in the periphery of the users’ view. However, using foveated rendering to reduce temporal resolution, i.e., rendering frame rate, seems less explored. In this work, we present the results of a user study investigating the perceptual effects of foveated temporal resolution reduction, where only the temporal resolution (frame rate) is reduced in the periphery without affecting spatial quality (pixel density). In particular, we investigated the perception of temporal resolution artifacts caused by reducing the frame rate dependent on the eccentricity of the user’s gaze. Our user study with 15 participants was conducted in a virtual reality setting using a head-mounted display. Our results indicate that it was possible to reduce average rendering costs, i.e., the number of rendered pixels, to a large degree before participants consistently reported perceiving temporal artifacts.
Information Visualization (InfoVis) systems utilize visual representations to enhance data interpretation. Understanding how visual attention is allocated is essential for optimizing interface design. However, collecting Eye-tracking (ET) data presents challenges related to cost, privacy, and scalability. Computational models provide alternatives for predicting gaze patterns, thereby advancing InfoVis research. In our study, we conducted an ET experiment with 40 participants who analyzed graphs while responding to questions of varying complexity within the context of digital forensics. We compared human scanpaths with synthetic ones generated by models such as DeepGaze, UMSS, and Gazeformer. Our research evaluates the accuracy of these models and examines how question complexity and number of nodes influence performance. This work contributes to the development of predictive modeling in visual analytics, offering insights that can enhance the design and effectiveness of InfoVis systems.
In most remote webcam-based eye-tracking solutions, dynamic heatmaps —also known as dynamic attention maps —are commonly generated as experimental outputs in video format. These videos consist of screen recordings of participants’ sessions, with hotspots marked on each frame based on estimated eye-gaze points. Despite the convenience of usage of these off-the-shelf solutions, we find that these videos are incomplete without specific gaze point coordinates, which are required for any data analytics, e.g., cognitive load estimation. In this work, we propose a novel deep learning-based method for finding the most probable gaze point from each frame of the heatmap videos. Specifically, a position coordinate (x, y) on the user’s screen is estimated as the gaze point by feeding the temporal sequence of hotspots into a deep learning model used for object detection problems in computer vision. As a proof of concept, we demonstrate that deep neural networks detect the latest changing hotspots in the frame sequence of a dynamic attention map video, thus capturing the temporal coherence. We implement hotspot detection using two Convolutional Neural Network (CNN) architectures and one transformer-based architecture, where the latter performs better in our experiments.
Multiple challenges emerge when analyzing eye-tracking data with areas of interest (AOIs) because recordings are subject to different sources of uncertainties. Previous work often presents gaze data without considering those inaccuracies in the data. To address this issue, we developed uncertainty-aware scarf plot visualizations that aim to make analysts aware of uncertainties with respect to the position-based mapping of gaze to AOIs and depth dependency in 3D scenes. Additionally, we also consider uncertainties in automatic AOI annotation. We showcase our approach in comparison to standard scarf plots in an augmented reality scenario.
The eyes play an important role in human collaboration. Mutual and shared gaze help communicate visual attention to each other or to a specific object of interest. Shared gaze was typically investigated for pair collaborations in remote settings and with people in virtual and augmented reality. With our work, we expand this line of research by a new technique to communicate gaze between groups in tabletop workshop scenarios. To achieve this communication, we use an approach based on projection mapping to unify gaze data from multiple participants into a common visualization space on a tabletop. We showcase our approach with a collaborative puzzle-solving task that displays shared visual attention on individual pieces and provides hints to solve the problem at hand.
Foveated rendering uses lower resolution and image quality in the region of peripheral vision to reduce rendering costs and speed up image generation. In theory, the reduced work for the graphics card can be leveraged to improve rendering performance or reduce energy consumption. However, no attempts have been made so far to quantify such energy saving in the context of scientific visualization. To address this gap, we investigate the energy consumption of foveated rendering—for the example of direct volume visualization. In particular, we test two acceleration techniques: one utilizes Variable Rate Shading as supported by GPU vendors, and the other one is a custom implementation of sparse sampling outside the fovea based on Linde-Buzo-Gray stippling. We compare these techniques to a full-resolution volume visualization and quantify the energy reduction of the whole rendering system, including the eye tracking device.
In eye-tracking data analysis, researchers examine where subjects focus their attention by viewing temporal gaze position data. This paper introduces “Gaze Tiling”, a novel visualization tool that combines three timelines: video progression, tiled gaze point images, and 2D color maps of gaze positions. Traditional visualization methods have several limitations. Cropped gaze point images do not clarify whether changes result from gaze movement or scene changes, while thumbnail timelines miss changes between frames. We employed the TimeSpaceSlice method, which preserves both temporal and spatial continuity. Preliminary tests showed that Gaze Tiling works effectively with videos having relatively fixed camera positions and limited object movement, such as online lectures. This approach helps researchers better understand the causes of changes in gaze position and their context within the overall scene.
Understanding how humans interact with information visualizations is crucial for improving user experience and designing effective visualization systems. While previous studies have focused on task-agnostic visual attention, the relationship between attention patterns and visual analytical tasks remains underexplored. This paper investigates how attention data on charts can be used to classify question types, providing insights into question-driven gaze behaviors. We propose ChartQC, a question classification model leveraging spatial feature alignment in chart images and visual attention data. By aligning spatial features, our approach strengthens the integration of visual and attentional cues, improving classification accuracy. These findings help deepen the understanding of user perception in charts and provide a basis for future research on interactive visual analysis.
With the exponential adoption of smartphones among young people, the rapid adoption of digital environments, namely Metaverse, for cultural engagement is increasing in creative industries. There is a limited understanding of how digital artwork, exhibited in digital art galleries, influences visitors’ visual attention and engagement with artworks in Metaverse’s Art Gallery. This study used a multi-method approach, triangulation of eye tracking experiments on smartphones, mapping AOIs, and developing user journeys based on visual attention patterns. The user journey map framework helped to evaluate behavior patterns in the Metaverse Digital Art Gallery with implications for industry and academia. This is one of the few studies that utilized the user journey approach and mapped gaze data to understand the behavior of digital visitors within the Metaverse Art Gallery. The findings provide useful guidance for further application in art gallery design and research and inform better ways to engage a wider audience of digital visitors in valuable cultural experiences.
Gaze behavior provides key insights into cognitive processes such as attention, perception, and decision-making. Traditional eye-tracking metrics—such as fixation durations—capture meaningful patterns but overlook re-fixation dynamics, which signal shifts in gaze strategies. Recurrence Quantification Analysis (RQA) detects such patterns but aggregates data across entire tasks, obscuring temporal variations. Sliding Window RQA (SWRQA) tracks recurrence over time, yet fixed window sizes struggle with short fixation sequences and long-span recurrences, limiting suitability for gaze data. To address these limitations, we present Running RQA (RRQA). This proof-of-concept computes standard RQA metrics at each fixation, enabling fine-grained temporal tracking of Recurrence Rate (REC), Determinism (DET), Laminarity (LAM), and Center of Recurrence Mass (CORM). We introduce three visualization techniques to map changes in RRQA metrics for per-fixation, multi-participant comparisons. An open-source web application enables researchers to upload, analyze, and explore gaze data with RRQA visualizations interactively, facilitating detailed investigations of re-examination behavior: https://osf.io/ptfe7/.
Wearable eye-tracking in field studies presents challenges in synchronising gaze data with dynamic stimuli and integrating observational notes from multiple observers. Existing tools often struggle to visualise eye-tracking patterns in complex, real-world environments with frequently changing areas of interest (AOIs). To address this, we propose a streamlined workflow that simplifies analysis preparation by integrating real-time observer notes with eye-tracking data with enhanced timestamp-based synchronisation, improving data mapping, and automating AOI detection with an energy control room use case. This workflow makes eye-tracking tools like Gazealytics more practical for complex field studies. By streamlining data preparation and automation, our method enhances the scalability and usability of eye-tracking analysis in complex environments, enabling more efficient and accurate visual analysis of real-world decision-making.
Area of interest (AOI)-based analytics have been demonstrated to be effective measures of gaze dispersion and cognitive demands, but require extensive pre-processing, limiting real-time applications. We introduce non-AOI-based metrics of visual dispersion (% Field of View: %FOV) and cognitive tunneling behaviours (i.e., prolonged fixation toward a small region in the environment) that depend on the average distance between fixations to the mean center, eliminating spatial segmentation and AOI-label validation processes. The utility of these novel metrics is examined in an immersive aviation simulation task where vision is degraded (e.g., visual acuity). When visual acuity was reduced, %FOV significantly decreased, reflecting reduced fixation distribution, while cognitive tunneling frequency and total duration increased. In line with previous AOI-based work, the proposed non-AOI metrics effectively captured changes in visual attention allocation suggestive of increased demand on other cognitive processes due to reduced visual information availability, demonstrating potential for real-time applications in complex environments.
This paper presents a novel approach to visual attention training for novice tram drivers by combining a tram driving simulator with eye-tracking-enabled Augmented Reality (AR) glasses. Based on previous eye-tracking studies and interviews with tram driving instructors, we identified issues related to the attention distribution of novice drivers. That is, we found that driver field-of-view areas critical to tram-driving safety, are often neglected by novices. For instance, such areas enable drivers to check the right-side mirror before turning right. The training system proposed here aims to direct novice visual attention to critical areas by highlighting overlooked regions, displaying directional arrows, or providing auditory cues from the left or right side. Integration of real-time eye tracking into AR glasses supports continuous monitoring of trainee gaze patterns, offering instructors valuable information on trainee visual behavior. Such real-time data allows for dynamic adaptation of training cues, ensuring that visual attention guidance is both relevant and personalized. The proposed system has the potential to enhance situational awareness among tram drivers.
Computer-vision-based eye tracking is poised to accelerate educational personalization through accessible quantification of learning and neurodevelopment. Recently, gaze patterns related to social communication were found to be more strongly associated in monozygotic (identical) versus dizygotic (fraternal) twins, suggesting genetic contributions. However, the intrinsic characteristic of blinking, which also reflects development and cognition, remains underexplored. This study uses computer-vision-based blink detection to characterize blinking in twins enrolled in a remote tablet-based study of infant and toddler attention. DBSCAN face-fingerprint clustering and caffemodel age prediction separated parent and child faces. Facial landmarks and head orientation were extracted using RetinaFace. Blink detection relied on fine-tuned EfficientNet-B4 refined by xgboost. Strong associations in blink probabilities were found across twins. Blink rates were correlated in monozygotic but not dizygotic twins. This work suggests that computer vision-derived blink measures may reflect genetic influences and expand a toolbox for attention-related quantification of intrinsic human variation.
Individuals with neurodevelopmental conditions rely on gaze-based nonverbal communication to facilitate expression recognition and conversational turn-taking. By standardizing Areas of Interest (AOI) on rigged Virtual Human (VH) faces, gaze patterns can be more accurately analyzed, providing precise and responsive feedback in digital communication. To achieve this, a standardization technique was developed for facial AOI placement on VHs. By building upon anthropological landmarks and existing facial detection heuristics, this work presents a comprehensive guideline for AOI demarcation. Validation testing was conducted using Microsoft Rocketbox Avatars and the Microsoft Hololens 2 (HL2). The project provides a scalable and reproducible framework enabling broader adoption for eXtended Reality (XR) environments.
Climate change data is complex and difficult to interpret, which may hinder public participation in mitigation efforts. This study examines whether gaze cueing can enhance the accessibility of this information by directing visual attention in the climate information-based collaboration task. Twenty participants collaborated in groups of four on a climate-related task while wearing mobile eye trackers. In the experimental condition, participants received gaze cues in an instructional video before performing the task, the purpose of which was to choose the best strategy to mitigate climate change. Statistical analysis using the Levenshtein distance on scanpaths showed that gaze cues fostered attention synchronization. The findings suggest that gaze cues in groups effectively guide attention through complex climate information, which is the first step to its comprehension.
Conventional voice-activated wake words (e.g., “Hey Siri”) inherently exclude Deaf and Hard of Hearing (DHH) users, emphasizing the need for an inclusive method to initiate interactions with AI assistants. We propose Look to Wake, a gaze-based activation strategy that benefits both DHH and hearing individuals by aligning with Deaf communication norms, where eye contact—rather than explicit signing—commonly initiates conversations. To ensure safety and usability in dynamic settings like driving or walking, we introduce brief repeated glances (“flicker gaze”) augmented by subtle peripheral visual feedback. Drawing on recent HCI research, we identify key gaps—longitudinal evaluation, cross-modal comparisons, and technical robustness—and call for inclusive, participatory design to accommodate the diverse needs of the DHH community. Ultimately, our approach reimagines wake-word interactions by shifting from spoken commands to more intuitive, visually oriented methods, paving the way for accessible, multimodal assistant technologies.
This paper presents a sign language conversation system based on the See-Through Face Display to address the challenge of maintaining eye contact in remote sign language interactions. A camera positioned behind a transparent display allows users to look at the face of their conversation partner while appearing to maintain direct eye contact. Unlike conventional methods that rely on software-based gaze correction or large-scale half-mirror setups, this design reduces visual distortions and simplifies installation. We implemented and evaluated a videoconferencing system that integrates See-Through Face Display, comparing it to traditional videoconferencing methods. We explore its potential applications for Deaf and Hard of Hearing (DHH), including multi-party sign language conversations, corpus collection, remote interpretation, and AI-driven sign language avatars. Collaboration with DHH communities will be key to refining the system for real-world use and ensuring its practical deployment.
Eye-tracking analysis plays a vital role in medical imaging, providing key insights into how radiologists visually interpret and diagnose clinical cases. In this work, we first analyze radiologists’ attention and agreement by measuring the distribution of various eye-movement patterns, including saccades direction, amplitude, and their joint distribution. These metrics help uncover patterns in attention allocation and diagnostic strategies. Furthermore, we investigate whether and how doctors’ gaze behavior shifts when viewing authentic (Real) versus deep-learning-generated (Fake) images. To achieve this, we examine fixation bias maps, focusing on first, last, short, and longest fixations independently, along with detailed saccades patterns, to quantify differences in gaze distribution and visual saliency between authentic and synthetic images.
Early detection of learning disorders is essential for timely intervention, fostering improved academic performance and overall well-being. This research explores the potential of Large Language Models (LLMs) combined with cost-effective eye-tracking data for detecting dyslexia, one of the most prevalent learning disorders. The prominent LLMs included are DeepSeek-V3, Llama3.3-70B, GPT-4o, GPT-o3-mini, and GPT-o1-mini. Leveraging data from 70 participants across three distinct tasks, our findings reveal that LLMs can outperform traditional Machine Learning (ML) models. Notably, few-shot prompting significantly enhances accuracy, demonstrating the adaptability and efficiency of LLM-driven approaches. In summary, this study presents a novel approach to dyslexia detection by integrating eye-tracking data with LLMs. By outperforming specialised ML models, this scalable approach optimises resources and expands early detection, making dyslexia assessment more accessible. It enables timely support, enhancing academic performance and overall well-being for affected individuals.
Advancements in eye-tracking technology have greatly expanded our ability to study human visual and cognitive processes in naturalistic settings. This paper introduces a new pipeline for optimizing eye-tracking video data, aimed at enhancing computational efficiency and enabling deeper contextual analysis through the integration of Large Language Models (LLMs). We propose a method that selectively captures keyframes during moments of sustained attention, significantly reducing data volume while preserving essential information. This optimization is complemented by the use of LLMs for object identification and contextual interpretation of the visual data within these keyframes. Our approach addresses significant challenges such as the high storage demands and computational overhead associated with processing large video recordings. We demonstrate the feasibility of our method through a software prototype and preliminary testing on two 60-minute recordings. Our results suggest that meaningful information about a user’s gaze can be extracted and inferred by selectively capturing video based on fixation events, resulting in a data reduction equivalent to approximately 4-8 frames per minute on average. We argue that this approach has the potential to enable more effective and scalable gaze-tracking applications in real-world settings. Finally, we outline potential improvements to enhance reliability in dynamic scenes with moving objects.
Eye-tracking technology is increasingly integrated into smart glasses and wearable devices, becoming more prevalent in daily life. Meanwhile, generative artificial intelligence (GenAI) has the potential to transform user experiences through personalization and adaptive interactions. The integration of these technologies offers a novel opportunity to refine GenAI models by leveraging human gaze data in adaptive interfaces, personalized content generation, and human-computer interaction. However, gaze data is highly sensitive and can reveal several user attributes, such as cognitive states, emotions, and even medical conditions. Therefore, the use of gaze data to inform GenAI models raises significant privacy concerns. Hence, in this paper, we highlight the implications of using eye gaze information in GenAI models with privacy and societal considerations and discuss strategies to mitigate potential privacy violations. By addressing these issues, we can ensure a trade-off between technological advancements and privacy protection.
The goal of the presented work is to propose and evaluate a machine learning model that transforms a high-quality eye-tracking (ET) signal to exhibit a lower level of signal quality, mimicking the output of an ET device of interest. Future practical utility of this work is to increase the amount of available ET data to study the potential performance of a specific ET pipeline on various tasks without requiring additional data collection.
Eye-tracking-while-reading data provide valuable insights across multiple disciplines, including psychology, linguistics, natural language processing, education, and human-computer interaction. Despite its potential, the availability of large, high-quality, multilingual datasets remains limited, hindering both foundational reading research and advancements in applications. The MultiplEYE project addresses this gap by establishing a large-scale, international eye-tracking data collection initiative. It aims to create a multilingual dataset of eye movements recorded during natural reading, balancing linguistic diversity, while ensuring methodological consistency for reliable cross-linguistic comparisons. The dataset spans numerous languages and follows strict procedural, documentation, and data pre-processing standards to enhance eye-tracking data transparency and reproducibility. A novel data-sharing framework, integrated with data quality reports, allows for selective data filtering based on research needs. Researchers and labs worldwide are invited to join the initiative. By establishing and promoting standardized practices and open data sharing, MultiplEYE facilitates interdisciplinary research and advances reading research and gaze-augmented applications.
The invisible boundary paradigm (gaze contingency) was used to investigate the availability and retrieval of phonological information when the letters of a word are retrieved by parafoveal vision. For this purpose, the target stimuli were masked by swapping or replacing the first two letters. The results show that fixation times (first fixation duration and dwell time) are shorter in the transposition condition than in the substitution condition, indicating that the identity of the phonemes is already retrieved parafoveally, regardless of their order in the word. Croatian was used for the experiment, a language with a transparent and fine-grained orthography with a grapheme-phoneme correspondence of almost 1:1, which allows the experimental exclusion of any confounding effects of orthography on phonology.
While Generative AI models like Large Language Models (LLMs) are capable of generating extensive text, their efficacy in producing readable content for human participants in experimental settings remains to be evaluated. Further, eye-tracking technology is increasingly utilized to study cognition and behavior, yet its application to readers’ cognitive processes when exposed to AI-generated versus human-authored texts remains unexplored. This study investigates how text generated by LLMs influences reading by analyzing gaze patterns.
The study collects gaze data from 13 participants as they read AI-generated and human-authored passages. A comparative analysis is conducted within subjects to assess gaze patterns between authors and between text types based on the robust two-means clustering (I2MC) algorithm to identify fixations. In addition, pupil dilation and reading speed were examined.
Our findings reveal significant differences in fixation characteristics not only between authors but also between AI-generated and human-authored texts.
Reading is a complex process that can be affected by alcohol consumption. In this paper, we investigate whether eye-tracking measures can be utilised to recognise the state of alcohol intoxication in readers. We analysed natural reading data in which sober and intoxicated participants read instructions in Swedish prior to performing experimental tasks. Our analysis revealed that fixation and saccade duration, fixation and saccade rate, saccade amplitude, and saccade peak velocity were all significantly affected by alcohol. Furthermore, a logistic regression classifier was able to correctly detect the intoxication state with 87% accuracy.
High-quality eye-tracking equipment plays an important role in understanding reading behaviors and linguistic processes, such as part-of-speech (PoS) tagging and syntactic dependency parsing. However, this technology is expensive, and data is not always available for under represented languages, such as Albanian. The expense and lack of appropriate equipment makes it difficult to conduct research in reading behavior in Albanian but not only. To address this limitation, we explore the usage of low-cost alternatives to study reading behaviors. Specifically, we leverage mouse tracking data to improve the accuracy of computational linguistic models that predict PoS and syntactic dependencies. The preliminary results from our study suggest that the addition of mouse tracking data, particularly reading time, significantly improves the performance of various linguistic prediction models. These findings emphasizes that mouse tracking data combined with annotated texts provides a viable solution to enhance existing models for low resource languages like Albanian.
Early identification of children at risk of reading difficulties is paramount for promoting educational success and equity, as earlier interventions are more effective. Traditional assessment methods of early reading abilities, however, are resource-intensive, and often require basic reading abilities. Predicting reading acquisition from non-reading tasks, particularly during the early stages of formal reading instruction, overcomes these limitations. In this paper, we investigate to what extent eye movements recorded during a visual search task that is hypothesized to correlate with reading ability allows to predict children’s reading comprehension scores at the time of recording as well as one year later. Using machine learning methods that allow for an evaluation of feature importance, namely Neural Additive Models and Random Forests, we explore what eye movement features obtained from a visual search task are predictive of reading comprehension, thus laying the groundwork for future research in early assessment systems based on eye movements.
We propose a camera-free eye tracking solution for smart eyewear, leveraging a constellation of infrared photodetectors (PDs) discreetly integrated along lens edges to maintain aesthetics and increase robustness. This mechanically miniaturized design offers 16 signals per lens and significantly reduces power consumption, while retaining adequate resolution for extended reality scenarios. In fact, each lens is coupled with four PDs excited by four LEDs managed by a miniaturized and wireless processing board hosting a low-power microcontroller. A compact neural network, running in real time and trained on artificial eyes mounted on a motorized two-axis gimbal, processes 20 differential signals and achieves ∼ 4° gaze accuracy at 47mW consumption and 70Hz sampling rate. Preliminary human-eye tests confirm reliable blink detection and potential for classification of gaze direction in 5 quadrants in the visual field.
The number of devices embedding eye tracking (ET) capabilities, such as portable webcam-based consumer devices and wearable ones, such as headsets and smart eyeglasses, is rapidly increasing, making this technology truly pervasive. Despite the large number of papers and reviews discussing data quality and benchmarking of trackers, none of them is addressing the trade-off between power consumption, speed and accuracy. Power dissipation is typically dominated by signal processing to extract gaze information from sensors embedded in the glasses. This compromise is crucial for smart glasses, powered by miniature batteries, offering a typical power budget of a few tens of mW for ET. Here we propose a simple benchmarking flow for wearable trackers, focused on power consumption, as well as accuracy, precision and sampling rate, and based on three complementary test setups. We report the preliminary results of the experimental characterization of 6 commercial trackers in the first static setup and we show a comparison of their performance based on a single figure of merit.
Advanced multimodal AI agents can now collaborate with users to solve challenges in the world. Yet, these emerging contextual AI systems rely on explicit communication channels between the user and system. We hypothesize that implicit communication of the user’s interests and intent would reduce friction and improve user experience when collaborating with AI agents. In this work, we explore the potential of wearable eye tracking to convey signals about user attention. We measure the eye tracking signal quality requirements to effectively map gaze traces to physical objects, then conduct experiments that provide visual scanpath history as additional context when querying vision language models. Our results show that eye tracking provides high value as a user attention signal and can convey important context about the user’s current task and interests, improving understanding of contextual AI agents.
CUPIDO (Circuit for Unobtrusive Palpebral Interpretation and Detection Optimization) is an ultra-low-power electrostatic sensor able to convert eye blinks into digital events with a detection sensitivity up to \( 90.5 \,\% \). It can be easily integrated into the rims of smart glasses allowing for contactless interaction without compromising comfort and privacy (since no camera is used). Thanks to its extremely low power consumption (\( 385.1 \,\mu \mathrm{W} \) at peak during the blink), CUPIDO can extend battery life in smart glasses, allowing for continuous and real-time (detection latency of approximately 1 ms) monitoring applications like hands-free glasses control, assistive technologies, augmented reality, and drowsiness monitoring while driving.
Pervasive eye-tracking technology for eyewear devices represents a major advancement in wearable computing, enabling intuitive interaction and improving accessibility. However, the low-power constraints of these devices present a significant challenge in balancing accuracy with limited computational capacity. This study focuses on developing and evaluating algorithms for a low-power wearable infrared eye-tracking system conceived to work 24/7. The system includes a custom-built prototype that integrates infrared LEDs and photodiodes, strategically positioned on smart eyewear to estimate gaze direction. A humanoid robot, Ami Desktop, was utilized to create a controlled and robust dataset. Two deep learning architectures were investigated: a Multi-Layer Perceptron (MLP) and a tailored Hierarchical Neural Network (HNN). Variants of these models incorporating dimensionality reduction techniques were implemented to optimize performance and efficiency for low-power microcontrollers. The results demonstrate the superior accuracy and reasonable computational demands of the HNN models, highlighting their potential for continuous, real-time and portable eye-tracking applications.
This paper explores radioactive watermarking as a technique for embedding invisible information in eye-tracking data, ensuring that any model trained on the modified samples retains an identifiable mark. Large-scale datasets have enabled robust deep learning models for appearance-based gaze estimation, but no reliable methods currently exist to detect unauthorized use of datasets. To address this, we evaluate radioactive watermarking, which embeds a watermark into eye data using pre-trained convolutional neural networks commonly used in gaze estimation models. We assess watermark robustness through gaze classification experiments, testing multiple neural architectures in different embedding and detection setups. Results demonstrate that training with watermarked data can be detected with high confidence, depending on the proportion of watermarked samples and the training setup. Detection is reliable with at least 10% watermarked data, while exceeding 15% degrades performance without significantly improving detection. Watermarks that retain high image quality preserve network performance and enable consistent detection.
Gaze behavior is a critical nonverbal cue for assessing attention and social engagement in children with Autism Spectrum Disorder (ASD). Gaze patterns of ten children (aged 5–10), including eight diagnosed with ASD, were examined during robot-assisted therapy sessions recorded with a fisheye camera in a naturalistic setting. The resulting EMBOA-Gaze dataset was manually annotated at both the gaze target coordinate and the semantic region, categorized as “Robot,” “Therapist,” or “Other.” Statistical analysis showed that children with ASD looked significantly more at the robot than the therapist (p =.002) and the other region (p =.030). A modular deep learning-based system was developed to predict gaze targets: head detection, region detection, and a Customized Spatio-Temporal Gaze Detection module (C-STGD). The model was trained on EMBOA-Gaze, achieving an AUC of 0.85 for point-level coordinate prediction and 75% accuracy in classifying gaze targets into semantic regions.