This paper introduces Sketched Reality, an approach that combines AR sketching and actuated tangible user interfaces (TUI) for bi-directional sketching interaction. Bi-directional sketching enables virtual sketches and physical objects to “affect” each other through physical actuation and digital computation. In the existing AR sketching, the relationship between virtual and physical worlds is only one-directional — while physical interaction can affect virtual sketches, virtual sketches have no return effect on the physical objects or environment. In contrast, bi-directional sketching interaction allows the seamless coupling between sketches and actuated TUIs. In this paper, we employ tabletop-size small robots (Sony Toio) and an iPad-based AR sketching tool to demonstrate the concept. In our system, virtual sketches drawn and simulated on an iPad (e.g., lines, walls, pendulums, and springs) can move, actuate, collide, and constrain physical Toio robots, as if virtual sketches and the physical objects exist in the same space through seamless coupling between AR and robot motion. This paper contributes a set of novel interactions and a design space of bi-directional AR sketching. We demonstrate a series of potential applications, such as tangible physics education, explorable mechanism, tangible gaming for children, and in-situ robot programming via sketching.
We present PassengXR, an open-source toolkit for creating passenger eXtended Reality (XR) experiences in Unity. XR allows travellers to move beyond the physical limitations of in-vehicle displays, rendering immersive virtual content based on - or ignoring - vehicle motion. There are considerable technical challenges to using headsets in moving environments: maintaining the forward bearing of IMU-based headsets; conflicts between optical and inertial tracking of inside-out headsets; obtaining vehicle telemetry; and the high cost of design given the necessity of testing in-car. As a consequence, existing vehicular XR research typically relies on controlled, simple routes to compensate. PassengXR is a cost-effective open-source in-car passenger XR solution. We provide a reference set of COTS hardware that enables the broadcasting of vehicle telemetry to multiple headsets. Our software toolkit then provides support to correct vehicle-headset alignment, and then create a variety of passenger XR experiences, including: vehicle-locked content; motion- and location-based content; and co-located multi-passenger applications. PassengXR also supports the recording and playback of vehicle telemetry, assisting offline design without resorting to costly in-car testing. Through an evaluation-by-demonstration, we show how our platform can assist practitioners in producing novel, multi-user passenger XR experiences.
Gaze-based target suffers from low input precision and target occlusion. In this paper, we explored to leverage the continuous eyelid movement to support high-efficient and occlusion-robust dwell-based gaze pointing in virtual reality. We first conducted two user studies to examine the users’ eyelid movement pattern both in unintentional and intentional conditions. The results proved the feasibility of leveraging intentional eyelid movement that was distinguishable with natural movements for input. We also tested the participants’ dwelling pattern for targets with different sizes and locations. Based on these results, we propose DEEP, a novel technique that enables the users to see through occlusions by controlling the aperture angle of their eyelids and dwell to select the targets with the help of a probabilistic input prediction model. Evaluation results showed that DEEP with dynamic depth and location selection incorporation significantly outperformed its static variants, as well as a naive dwelling baseline technique. Even for 100% occluded targets, it could achieve an average selection speed of 2.5s with an error rate of 2.3%.
Freehand interactions with augmented and virtual reality are growing in popularity, but they lack reliability and robustness. Implicit behavior from users, such as hand or gaze movements, might provide additional signals to improve the reliability of input. In this paper, the primary goal is to improve the detection of a selection gesture in VR during point-and-click interaction. Thus, we propose and investigate the use of information contained within the hand motion dynamics that precede a selection gesture. We built two models that classified if a user is likely to perform a selection gesture at the current moment in time. We collected data during a pointing-and-selection task from 15 participants and trained two models with different architectures, i.e., a logistic regression classifier was trained using predefined hand motion features and a temporal convolutional network (TCN) classifier was trained using raw hand motion data. Leave-one-subject-out cross-validation PR-AUCs of 0.36 and 0.90 were obtained for each model respectively, demonstrating that the models performed well above chance (=0.13). The TCN model was found to improve the precision of a noisy selection gesture by 11.2% without sacrificing recall performance. An initial analysis of the generalizability of the models demonstrated above-chance performance, suggesting that this approach could be scaled to other interaction tasks in the future.
With the proliferation of consumer-level virtual reality (VR) devices, users started experiencing VR in less controlled environments, such as in social gatherings and public areas. While the current VR hardware provides an increasingly immersive experience, it ignores stimuli originating from the physical surroundings that distract users from the VR experience. To block distractions from the outside world, many users wear noise-canceling headphones. However, this is insufficient to block loud or transient sounds (e.g., drilling or hammering) and, especially, multi-modal distractions (e.g., air drafts, temperature shifts from an A/C, construction vibrations, or food smells). To tackle this, we explore a new concept, where we directly integrate the distracting stimuli from the user's physical surroundings into their virtual reality experience to enhance presence. Using our approach, an otherwise distracting wind gust can be directly mapped to the sway of trees in a VR experience that already contains trees. Using our novel approach, we demonstrate how to integrate a range of distractive stimuli into the VR experience, such as haptics (temperature, vibrations, touch), sounds, and smells. To validate our approach, we conducted three user studies and a technical evaluation. First, to validate our key principle, we conducted a controlled study where participants were exposed to distractions while playing a VR game. We found that our approach improved users’ sense of presence, compared to wearing noise-canceling headphones. From these results, we engineered a sensing module that detects a set of simple distractive signals (e.g., sounds, winds, and temperature shifts). We validated our hardware in a technical evaluation and in an out-of-lab study where participants played VR games in an uncontrolled environment. Moreover, to gather the perspective of VR content creators that might one day utilize a system inspired by our findings, we invited game designers to use our approach and collected their feedback and VR designs. Finally, we present design considerations for mapping distracting external stimuli and discuss ethical considerations of integrating real-world stimuli into virtual reality.
Intelligent suggestion techniques can enable low-friction selection-based input within virtual or augmented reality (VR/AR) systems. Such techniques leverage probability estimates from a target prediction model to provide users with an easy-to-use method to select the most probable target in an environment. For example, a system could highlight the predicted target and enable a user to select it with a simple click. However, as the probability estimates can be made at any time, it is unclear when an intelligent suggestion should be presented. Earlier suggestions could save a user time and effort but be less accurate. Later suggestions, on the other hand, could be more accurate but save less time and effort. This paper thus proposes a computational framework that can be used to determine the optimal timing of intelligent suggestions based on user-centric costs and benefits. A series of studies demonstrated the value of the framework for minimizing task completion time and maximizing suggestion usage and showed that it was both theoretically and empirically effective at determining the optimal timing for intelligent suggestions.
We present Flaticulation, a method to laser cut joints that clutch two cut-in-place flat boards at designated articulated angles. We discover special T-patterns added on the shared edge of two pieces allowing them to be clutched at a bending angle. We analyze the structure and propose a parametric model regarding the T-pattern under laser cutting to predict the joint articulated angle. We validate our proposed model by measuring real prototypes and conducting stress-strain analysis to understand their structural strength. Finally, we provide a user interface for our example applications, including fast assembling unfolded 3D polygonal models and adding detent mechanisms for functional objects such as a mouse and reconfigurable objects such as a headphone.
In this paper, we present Mixels, programmable magnetic pixels that can be rapidly fabricated using an electromagnetic printhead mounted on an off-the-shelve 3-axis CNC machine. The ability to program magnetic material pixel-wise with varying magnetic force enables Mixels to create new tangible, tactile, and haptic interfaces. To facilitate the creation of interactive objects with Mixels, we provide a user interface that lets users specify the high-level magnetic behavior and that then computes the underlying magnetic pixel assignments and fabrication instructions to program the magnetic surface. Our custom hardware add-on based on an electromagnetic printhead and hall effect sensor clips onto a standard 3-axis CNC machine and can both write and read magnetic pixel values from magnetic material. Our evaluation shows that our system can reliably program and read magnetic pixels of various strengths, that we can predict the behavior of two interacting magnetic surfaces before programming them, that our electromagnet is strong enough to create pixels that utilize the maximum magnetic strength of the material being programmed, and that this material remains magnetized when removed from the magnetic plotter.
Researchers have developed various tools and techniques towards the vision of on-demand fabrication of custom, interactive devices. Recent work has 3D-printed artefacts like speakers, electromagnetic actuators, and hydraulic robots. However, these are non-trivial to instantiate as they require post-fabrication mechanical– or electronic assembly. We introduce AirLogic: a technique to create electronics-free, interactive objects by embedding pneumatic input, logic processing, and output widgets in 3D-printable models. AirLogic devices can perform basic computation on user inputs and create visible, audible, or haptic feedback; yet they do not require electronic circuits, physical assembly, or resetting between uses. Our library of 13 exemplar widgets can embed AirLogic-style computational capabilities in existing 3D models. We evaluate our widgets’ performance—quantifying the loss of airflow (1) in each widget type, (2) based on printing orientation, and (3) from internal object geometry. Finally, we present five applications that illustrate AirLogic’s potential.
We present HingeCore, a novel type of laser-cut 3D structure made from sandwich materials, such as foamcore. The key design element behind HingeCore is what we call a finger hinge, which we produce by laser-cutting foamcore “half-way”. The primary benefit of finger hinges is that they allow for very fast assembly, as they allow models to be assembled by folding and because folded hinges stay put at the intended angle, based on the friction between fingers alone, which eliminates the need for glue or tabs. Finger hinges are also highly robust, with some 5mm foamcore models withstanding 62kg. We present HingeCoreMaker, a stand-alone software tool that automatically converts 3D models to HingeCore layouts, as well as an integration into a 3D modeling tool for laser cutting (Kyub ). We have used HingeCoreMaker to fabricate design objects, including speakers, lamps, and a life-size bust, as well as structural objects, such as functional furniture. In our user study, participants assembled HingeCore layouts 2.9x faster than layouts generated using the state-of-the-art for plate-based assembly (Roadkill ).
iWood is interactive plywood that can sense vibration based on triboelectric effect. As a material, iWood survives common woodworking operations, such as sawing, screwing, and nailing and can be used to create furniture and artifacts. Things created using iWood inherit its sensing capability and can detect a variety of user input and activities based on their unique vibration patterns. Through a series of experiments and machine simulations, we carefully chose the size of the sensor electrodes, the type of triboelectric materials, and the bonding method of the sensor layers to optimize the sensitivity and fabrication complexity. The sensing performance of iWood was evaluated with 4 gestures and 12 daily activities carried out on a table, nightstand, and cutting board, all created using iWood. Our result suggested over 90% accuracies for activity and gesture recognition.
Prototyping compact devices with unique form factors often requires the PCB manufacturing process to be outsourced, which can be expensive and time-consuming. In this paper, we present Fibercuit, a set of rapid prototyping techniques to fabricate high-resolution, flexible circuits on-demand using a fiber laser engraver. We showcase techniques that can laser cut copper-based composites to form fine-pitch conductive traces, laser fold copper substrates that can form kirigami structures, and laser solder surface-mount electrical components using off-the-shelf soldering pastes. Combined with our software pipeline, an end user can design and fabricate flexible circuits which are dual-layer and three-dimensional, thereby exhibiting a wide range of form factors. We demonstrate Fibercuit by showcasing a set of examples, including a custom dice, flex cables, custom end-stop switches, electromagnetic coils, LED earrings and a circuit in the form of kirigami crane.
Recent work demonstrated how we can design and use coding strips, a form of comic strips with corresponding code, to enhance teaching and learning in programming. However, creating coding strips is a creative, time-consuming process. Creators have to generate stories from code (code↦story) and design comics from stories (story↦comic). We contribute CodeToon, a comic authoring tool that facilitates this code-driven storytelling process with two mechanisms: (1) story ideation from code using metaphor and (2) automatic comic generation from the story. We conducted a two-part user study that evaluates the tool and the comics generated by participants to test whether CodeToon facilitates the authoring process and helps generate quality comics. Our results show that CodeToon helps users create accurate, informative, and useful coding strips in a significantly shorter time. Overall, this work contributes methods and design guidelines for code-driven storytelling and opens up opportunities for using art to support computer science education.
Following the prevalence of short-form video, short-form voice content has emerged on social media platforms like Twitter and Facebook. A challenge that creators face is hard constraints on the content length. If the initial recording is not short enough, they need to re-record or edit their content. Both are time-consuming, and the latter, if supported, can have a learning curve. Moreover, creators need to manually create multiple versions to publish content on platforms with different length constraints. To simplify this process, we present ROPE1 (Record Once, Post Everywhere). Creators can record voice content once, and our system will automatically shorten it to all length limits by removing parts of the recording for each target. We formulate this as a combinatorial optimization problem and propose a novel algorithm that automatically selects optimal sentence combinations from the original content to comply with each length constraint. Creators can customize the algorithmically shortened content by specifying sentences to include or exclude. Our system can also use the user-specified constraints to recompute and provides a new version. We conducted a user study comparing ROPE with a sentence-based manual editing baseline. The results show that ROPE can generate high-quality edits, alleviating the cognitive loads of creators for shortening content. While our system and user study address short-form voice content specifically, we believe that the same concept can also be applied to other media such as video with narration and dialog.
Blind people typically access videos via audio descriptions (AD) crafted by sighted describers who comprehend, select, and describe crucial visual content in the videos. 360° video is an emerging storytelling medium that enables immersive experiences that people may not possibly reach in everyday life. However, the omnidirectional nature of 360° videos makes it challenging for describers to perceive the holistic visual content and interpret spatial information that is essential to create immersive ADs for blind people. Through a formative study with a professional describer, we identified key challenges in describing 360° videos and iteratively designed OmniScribe, a system that supports the authoring of immersive ADs for 360° videos. OmniScribe uses AI-generated content-awareness overlays for describers to better grasp 360° video content. Furthermore, OmniScribe enables describers to author spatial AD and immersive labels for blind users to consume the videos immersively with our mobile prototype. In a study with 11 professional and novice describers, we demonstrated the value of OmniScribe in the authoring workflow; and a study with 8 blind participants revealed the promise of immersive AD over standard AD for 360° videos. Finally, we discuss the implications of promoting 360° video accessibility.
Video productions commonly start with a script, especially for talking head videos that feature a speaker narrating to the camera. When the source materials come from a written document – such as a web tutorial, it takes iterations to refine content from a text article to a spoken dialogue, while considering visual compositions in each scene. We propose Doc2Video, a video prototyping approach that converts a document to interactive scripting with a preview of synthetic talking head videos. Our pipeline decomposes a source document into a series of scenes, each automatically creating a synthesized video of a virtual instructor. Designed for a specific domain – programming cookbooks, we apply visual elements from the source document, such as a keyword, a code snippet or a screenshot, in suitable layouts. Users edit narration sentences, break or combine sections, and modify visuals to prototype a video in our Editing UI. We evaluated our pipeline with public programming cookbooks. Feedback from professional creators shows that our method provided a reasonable starting point to engage them in interactive scripting for a narrated instructional video.
We present RealityTalk, a system that augments real-time live presentations with speech-driven interactive virtual elements. Augmented presentations leverage embedded visuals and animation for engaging and expressive storytelling. However, existing tools for live presentations often lack interactivity and improvisation, while creating such effects in video editing tools require significant time and expertise. RealityTalk enables users to create live augmented presentations with real-time speech-driven interactions. The user can interactively prompt, move, and manipulate graphical elements through real-time speech and supporting modalities. Based on our analysis of 177 existing video-edited augmented presentations, we propose a novel set of interaction techniques and then incorporated them into RealityTalk. We evaluate our tool from a presenter’s perspective to demonstrate the effectiveness of our system.
To facilitate engaging and nuanced conversations around data, we contribute a touchless approach to interacting directly with visualization in remote presentations. We combine dynamic charts overlaid on a presenter’s webcam feed with continuous bimanual hand tracking, demonstrating interactions that highlight and manipulate chart elements appearing in the foreground. These interactions are simultaneously functional and deictic, and some allow for the addition of “rhetorical flourish”, or expressive movement used when speaking about quantities, categories, and time intervals. We evaluated our approach in two studies with professionals who routinely deliver and attend presentations about data. The first study considered the presenter perspective, where 12 participants delivered presentations to a remote audience using a presentation environment incorporating our approach. The second study considered the audience experience of 17 participants who attended presentations supported by our environment. Finally, we reflect on observations from these studies and discuss related implications for engaging remote audiences in conversations about data.
Designers and makers are increasingly interested in leveraging bio-based and bio-degradable ‘do-it-yourself’ (DIY) materials for sustainable prototyping. Their self-produced bioplastics possess compelling properties such as self-adhesion but have so far not been functionalized to create soft interactive devices, due to a lack of DIY techniques for the fabrication of functional electronic circuits and sensors. In this paper, we contribute a DIY approach for creating Interactive Bioplastics that is accessible to a wide audience, making use of easy-to-obtain bio-based raw materials and familiar tools. We present three types of conductive bioplastic materials and their formulation: sheets, pastes and foams. Our materials enable additive and subtractive fabrication of soft circuits and sensors. Furthermore, we demonstrate how these materials can substitute conventional prototyping materials, be combined with off-the-shelf electronics, and be fed into a sustainable material ‘life-cycle’ including disassembly, re-use, and re-melting of materials. A formal characterization of our conductors highlights that they are even on-par with commercially available carbon-based conductive pastes.
Bridges are unique structures appeared in fused deposition modeling (FDM) that make rigid prints flexible but not fully explored. This paper presents X-Bridges, an end-to-end workflow that allows novice users to design tunable bridges that can enrich 3D printed objects' deformable and physical properties. Specifically, we firstly provide a series of deformation primitives (e.g. bend, twist, coil, compress and stretch) with three versions of stiffness (loose, elastic, stable) based on parametrized bridging experiments. Embedding the printing parameters, a design tool is developed to modify the imported 3D model, evaluate optimized printing parameters for bridges, preview shape-changing process, and generate the G-code file for 3D printing. Finally, we demonstrate the design space of X-Bridges through a set of applications that enable foldable, resilient, and interactive shape-changing objects.
Foundation paper piecing is a widely used quilt-making technique in which fabric pieces are sewn onto a paper guide to facilitate construction. But, designing paper pieceable quilt patterns is challenging because the sewing process imposes constraints on both the geometry and sewing order of the fabric pieces. Based on a formative study with expert quilt designers, we develop a novel sketch-based tool for designing such quilt patterns. Our tool lets designers sketch a partial design as a set of edges, which may intersect but do not have to form closed polygons, and our tool automatically completes it into a fully paper pieceable pattern. We contribute a new sketch-completion algorithm that extends the input sketched edges into a planar mesh composed of closed polygonal faces representing fabric pieces, determines a paper pieceable sewing order for the faces, and breaks complicated sketches into independently paper pieceable sections when necessary. A partial input design often admits multiple visually different completions. Thus, our tool lets designers specify completion heuristics, which are based on current quilt design practices, to control the appearance of the completed quilt. Initial user evaluations with novice and expert quilt designers suggest that our tool fits within current design workflows and greatly facilitates designing foundation paper pieceable quilts by allowing users to focus on the visual design rather than tedious constraint checks.
Recent advances in smart materials have enabled displays to move beyond planar surfaces into the fabric of everyday life. We propose reflective light-diffuser modules for non-emissive flexible display systems. Our system leverages reflective-backed polymer-dispersed liquid crystal (PDLC), an electroactive material commonly used in smart window applications. This low-power non-emissive material can be cut to any shape, and dynamically diffuses light. We present the design & fabrication of two exemplar artifacts, a canvas and a handbag, that use the reflective light-diffuser modules. We also describe our content authoring pipeline and interaction modalities. We hope this work inspires future designers of flexible displays.
Garments with the ability to provide kinesthetic force-feedback on-demand can augment human capabilities in a non-obtrusive way, enabling numerous applications in VR haptics, motion assistance, and robotic control. However, designing such garments is a complex, and often manual task, particularly when the goal is to resist multiple motions with a single design. In this work, we propose a computational pipeline for designing connecting structures between active components—one of the central challenges in this context. We focus on electrostatic (ES) clutches that are compliant in their passive state while strongly resisting elongation when activated. Our method automatically computes optimized connecting structures that efficiently resist a range of pre-defined body motions on demand. We propose a novel dual-objective optimization approach to simultaneously maximize the resistance to motion when clutches are active, while minimizing resistance when inactive. We demonstrate our method on a set of problems involving different body sites and a range of motions. We further fabricate and evaluate a subset of our automatically created designs against manually created baselines using mechanical testing and in a VR pointing study.
Human environments are physically supported by floors, which prevent people and furniture from gravitational pull. Since our body motions continuously generate vibrations and loads that propagate into the ground, measurement of these expressive signals leads to unobtrusive activity sensing. In this study, we present Flexel, a modular floor interface for room-scale tactile sensing. By paving a room with floor interfaces, our system can immediately begin to infer touch locations, track user locations, recognize foot gestures, and detect object locations. Through a series of exploratory studies, we determined the preferable hardware design that adheres to construction conventions, as well as the optimal sensor density that mediates the trade-off between cost and performance. We summarize our findings into design guidelines that are generalizable to other floor interfaces. Finally, we provide example applications for room-scale tactile sensing enabled by our Flexel system.
Force sensing has been a key enabling technology for a wide range of interfaces such as digitally enhanced body and world surfaces for touch interactions. Additionally, force often contains rich contextual information about user activities and can be used to enhance machine perception for improved user and environment awareness. To sense force, conventional approaches rely on contact sensors made of pressure-sensitive materials such as piezo films/discs or force-sensitive resistors. We present ForceSight, a non-contact force sensing approach using laser speckle imaging. Our key observation is that object surfaces deform in the presence of force. This deformation, though very minute, manifests as observable and discernible laser speckle shifts, which we leverage to sense the applied force. This non-contact force-sensing capability opens up new opportunities for rich interactions and can be used to power user-/environment-aware interfaces. We first built and verified the model of laser speckle shift with surface deformations. To investigate the feasibility of our approach, we conducted studies on metal, plastic, wood, along with a wide variety of materials. Additionally, we included supplementary tests to fully tease out the performance of our approach. Finally, we demonstrated the applicability of ForceSight with several demonstrative example applications.
In this paper, we propose NFCStack, which is a physical building block system that supports stacking and frictionless interaction and is based on near-field communication (NFC). This system consists of a portable station that can support and resolve the order of three types of passive identifiable stackable: bricks, boxes, and adapters. The bricks support stable and sturdy physical construction, whereas the boxes support frictionless tangible interactions. The adapters provide an interface between the aforementioned two types of stackable and convert the top of a stack into a terminal for detecting interactions between NFC-tagged objects. In contrast to existing systems based on NFC or radio-frequency identification technologies, NFCStack is portable, supports simultaneous interactions, and resolves stacking and interaction events responsively, even when objects are not strictly aligned. Evaluation results indicate that the proposed system effectively supports 12 layers of rich-ID stacking with the three types of building block, even if every box is stacked with a 6-mm offset. The results also indicate possible generalized applications of the proposed system, including 2.5-dimensional construction. The interaction styles are described using several educational application examples, and the design implications of this research are explained.
Interactive Machine Teaching (IMT) systems allow non-experts to easily create Machine Learning (ML) models. However, existing vision-based IMT systems either ignore annotations on the objects of interest or require users to annotate in a post-hoc manner. Without the annotations on objects, the model may misinterpret the objects using unrelated features. Post-hoc annotations cause additional workload, which diminishes the usability of the overall model building process. In this paper, we develop LookHere, which integrates in-situ object annotations into vision-based IMT. LookHere exploits users’ deictic gestures to segment the objects of interest in real time. This segmentation information can be additionally used for training. To achieve the reliable performance of this object segmentation, we utilize our custom dataset called HuTics, including 2040 front-facing images of deictic gestures toward various objects by 170 people. The quantitative results of our user study showed that participants were 16.3 times faster in creating a model with our system compared to a standard IMT system with a post-hoc annotation process while demonstrating comparable accuracies. Additionally, models created by our system showed a significant accuracy improvement (ΔmIoU = 0.466) in segmenting the objects of interest compared to those without annotations.
Researchers have been exploring how incorporating care-based interactions can change the user's attitude & relationship towards an interactive device. This is typically achieved through virtual care where users care for digital entities. In this paper, we explore this concept further by investigating how physical care for a living organism, embedded as a functional component of an interactive device, also changes user-device relationships. Living organisms differ as they require an environment conducive to life, which in our concept, the user is responsible for providing by caring for the organism (e.g., feeding it). We instantiated our concept by engineering a smartwatch that includes a slime mold that physically conducts power to a heart rate sensor inside the device, acting as a living wire. In this smartwatch, the availability of heart-rate sensing depends on the health of the slime mold—with the user's care, the slime mold becomes conductive and enables the sensor; conversely, without care, the slime mold dries and disables the sensor (resuming care resuscitates the slime mold). To explore how our living device was perceived by users, we conducted a study where participants wore our slime mold-integrated smartwatch for 9-14 days. We found that participants felt a sense of responsibility, developed a reciprocal relationship, and experienced the organism's growth as a source of affect. Finally, to allow engineers and designers to expand on our work, we abstract our findings into a set of technical and design recommendations when engineering an interactive device that incorporates this type of care-based relationship.
We propose WaddleWalls, a room-scale interactive partitioning system using a swarm of robotic partitions that allows occupants to interactively reconfigure workspace partitions to satisfy their privacy and interaction needs. The system can automatically arrange the partitions’ layout designed by the user on demand. The user specifies the target partition’s position, orientation, and height using the controller’s 3D manipulations. In this work, we discuss the design considerations of the interactive partition system and implement WaddleWalls’ proof-of-concept prototype assembled with off-the-shelf materials. We demonstrate the functionalities of WaddleWalls through several application scenarios in an open-planned office environment. We also conduct an initial user evaluation that compares WaddleWalls with conventional wheeled partitions, finding that WaddleWalls allows effective workspace partitioning and mitigates the physical and temporal efforts needed to fulfill ad hoc social and privacy requirements. Finally, we clarify the feasibility, potential, and future challenges of WaddleWalls through an interview with experts.
Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge. Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algorithms potentially disrupting or driving interpretive scholarship. We take a human-centered design approach to addressing concerns around machine-assisted interpretive research to build Scholastic, which incorporates a machine-in-the-loop clustering algorithm to scaffold interpretive text analysis. As a scholar applies codes to documents and refines them, the resulting coding schema serves as structured metadata which constrains hierarchical document and word clusters inferred from the corpus. Interactive visualizations of these clusters can help scholars strategically sample documents further toward insights. Scholastic demonstrates how human-centered algorithm design and visualizations employing familiar metaphors can support inductive and interpretive research methodologies through interactive topic modeling and document clustering.
Wikidata is a companion to Wikipedia that captures a substantial part of the information about most Wikipedia entities in machine-readable structured form. In addition to directly representing information from Wikipedia itself, Wikidata also cross-references how additional information about these entities can be accessed through APIs on hundreds of other websites.
This trove of valuable information has become a source of numerous domain-specific information presentations on the web, such as art galleries or directories of actors. Developers have created a number of such tools that present Wikidata data, sometimes combined with data accessed through Wikidata’s cross-referenced web APIs. However, the creation of these presentations requires significant programming effort and is often impossible for non-programmers.
Consumers conducting comparison shopping, researchers making sense of competitive space, and developers looking for code snippets online all face the challenge of capturing the information they find for later use without interrupting their current flow. In addition, during many learning and exploration tasks, people need to externalize their mental context, such as estimating how urgent a topic is to follow up on, or rating a piece of evidence as a “pro” or “con,” which helps scaffold subsequent deeper exploration. However, current approaches incur a high cost, often requiring users to select, copy, context switch, paste, and annotate information in a separate document without offering specific affordances that capture their mental context. In this work, we explore a new interaction technique called “wiggling,” which can be used to fluidly collect, organize, and rate information during early sensemaking stages with a single gesture. Wiggling involves rapid back-and-forth movements of a pointer or up-and-down scrolling on a smartphone, which can indicate the information to be collected and its valence, using a single, light-weight gesture that does not interfere with other interactions that are already available. Through implementation and user evaluation, we found that wiggling helped participants accurately collect information and encode their mental context with a 58% reduction in operational cost while being 24% faster compared to a common baseline.
Unsupervised physical rehabilitation traditionally has used motion tracking to determine correct exercise execution. However, motion tracking is not representative of the assessment of physical therapists, which focus on muscle engagement. In this paper, we investigate if monitoring and visualizing muscle engagement during unsupervised physical rehabilitation improves the execution accuracy of therapeutic exercises by showing users whether they target the right muscle groups. To accomplish this, we use wearable electrical impedance tomography (EIT) to monitor muscle engagement and visualize the current state on a virtual muscle-skeleton avatar. We use additional optical motion tracking to also monitor the user’s movement. We conducted a user study with 10 participants that compares exercise execution while seeing muscle + motion data vs. motion data only, and also presented the recorded data to a group of physical therapists for post-rehabilitation analysis. The results indicate that monitoring and visualizing muscle engagement can improve both the therapeutic exercise accuracy during rehabilitation, and post-rehabilitation evaluation for physical therapists.
People spend a significant amount of time trying to make sense of the internet, collecting content from a variety of sources and organizing it to make decisions and achieve their goals. While humans are able to fluidly iterate on collecting and organizing information in their minds, existing tools and approaches introduce significant friction into the process. We introduce Fuse, a browser extension that externalizes users’ working memory by combining low-cost collection with lightweight organization of content in a compact card-based sidebar that is always available. Fuse helps users simultaneously extract key web content and structure it in a lightweight and visual way. We discuss how these affordances help users externalize more of their mental model into the system (e.g., saving, annotating, and structuring items) and support fast reviewing and resumption of task contexts. Our 22-month public deployment and follow-up interviews provide longitudinal insights into the structuring behaviors of real-world users conducting information foraging tasks.
Visual slide-based presentations are ubiquitous, yet slide authoring tools are largely inaccessible to people who are blind or visually impaired (BVI). When authoring presentations, the 9 BVI presenters in our formative study usually work with sighted collaborators to produce visual slides based on the text content they produce. While BVI presenters valued collaborators’ visual design skill, the collaborators often felt they could not fully review and provide feedback on the visual changes that were made. We present Diffscriber, a system that identifies and describes changes to a slide’s content, layout, and style for presentation authoring. Using our system, BVI presentation authors can efficiently review changes to their presentation by navigating either a summary of high-level changes or individual slide elements. To learn more about changes of interest, presenters can use a generated change hierarchy to navigate to lower-level change details and element styles. BVI presenters using Diffscriber were able to identify slide design changes and provide feedback more easily as compared to using only the slides alone. More broadly, Diffscriber illustrates how advances in detecting and describing visual differences can improve mixed-ability collaboration.
We present ReCapture, a system that leverages AR-based guidance to help users capture time-lapse data with hand-held mobile devices. ReCapture works by repeatedly guiding users back to the precise location of previously captured images so they can record time-lapse videos one frame at a time without leaving their camera in the scene. Building on previous work in computational re-photography, we combine three different guidance modes to enable parallel hand-held time-lapse capture in general settings. We demonstrate the versatility of our system on a wide variety of subjects and scenes captured over a year of development and regular use, and explore different visualizations of unstructured hand-held time-lapse data.
Debugging printed circuit boards (PCBs) can be a time-consuming process, requiring frequent context switching between PCB design files (schematic and layout) and the physical PCB. To assist electrical engineers in debugging PCBs, we present ARDW, an augmented reality workbench consisting of a monitor interface featuring PCB design files, a projector-augmented workspace for PCBs, tracked test probes for selection and measurement, and a connected test instrument. The system supports common debugging workflows for augmented visualization on the physical PCB as well as augmented interaction with the tracked probes. We quantitatively and qualitatively evaluate the system with 10 electrical engineers from industry and academia, finding that ARDW speeds up board navigation and provides engineers with greater confidence in debugging. We discuss practical design considerations and paths for improvement to future systems. A video demo of the system may be accessed here: https://youtu.be/RbENbf5WIfc .
Gesture-based recognition systems are susceptible to input recognition errors and user errors, both of which negatively affect user experiences and can be frustrating to correct. Prior work has suggested that user gaze patterns following an input event could be used to detect input recognition errors and subsequently improve interaction. However, to be useful, error detection systems would need to detect various types of high-cost errors. Furthermore, to build a reliable detection model for errors, gaze behaviour following these errors must be manifested consistently across different tasks. Using data analysis and machine learning models, this research examined gaze dynamics following input events in virtual reality (VR). Across three distinct point-and-select tasks, we found differences in user gaze patterns following three input events: correctly recognized input actions, input recognition errors, and user errors. These differences were consistent across tasks, selection versus deselection actions, and naturally occurring versus experimentally injected input recognition errors. A multi-class deep neural network successfully discriminated between these three input events using only gaze dynamics, achieving an AUC-ROC-OVR score of 0.78. Together, these results demonstrate the utility of gaze in detecting interaction errors and have implications for the design of intelligent systems that can assist with adaptive error recovery.
We propose a new technical approach to implement untethered VR haptic devices that contain no battery, yet can render on-demand haptic feedback. The key is that via our approach, a haptic device charges itself by harvesting the user's kinetic energy (i.e., movement)—even without the user needing to realize this. This is achieved by integrating the energy-harvesting with the virtual experience, in a responsive manner. Whenever our batteryless haptic device is about to lose power, it switches to harvesting mode (by engaging its clutch to a generator) and, simultaneously, the VR headset renders an alternative version of the current experience that depicts resistive forces (e.g., rowing a boat in VR). As a result, the user feels realistic haptics that corresponds to what they should be feeling in VR, while unknowingly charging the device via their movements. Once the haptic device's supercapacitors are charged, they wake up its microcontroller to communicate with the VR headset. The VR experience can now use the recently harvested power for on-demand haptics, including vibration, electrical or mechanical force-feedback; this process can be repeated, ad infinitum. We instantiated a version of our concept by implementing an exoskeleton (with vibration, electrical & mechanical force-feedback) that harvests the user's arm movements. We validated it via a user study, in which participants, even without knowing the device was harvesting, rated its’ VR experience as more realistic & engaging than with a baseline VR setup. Finally, we believe our approach enables haptics for prolonged uses, especially useful in untethered VR setups, since devices capable of haptic feedback are traditionally only reserved for situations with ample power. Instead, with our approach, a user who engages in hours-long VR and grew accustomed to finding a battery-dead haptic device that no longer works, will simply resurrect the haptic device with their movement.
Remote teleoperation is an important robot control method when they cannot operate fully autonomously. Yet, teleoperation presents challenges to effective and full robot utilization: controls are cumbersome, inefficient, and the teleoperator needs to actively attend to the robot and its environment. Inspired by end-user programming, we propose a new interaction paradigm to support robot teleoperation for combinations of repetitive and complex movements. We introduce Mimic, a system that allows teleoperators to demonstrate and save robot trajectories as templates, and re-use them to execute the same action in new situations. Templates can be re-used through (1) macros—parametrized templates assigned to and activated by buttons on the controller, and (2) programs—sequences of parametrized templates that operate autonomously. A user study in a simulated environment showed that after initial set up time, participants completed manipulation tasks faster and more easily compared to traditional direct control.
Vision-based 3D pose estimation has substantial potential in hand-object interaction applications and requires user-specified datasets to achieve robust performance. We propose ARnnotate, an Augmented Reality (AR) interface enabling end-users to create custom data using a hand-tracking-capable AR device. Unlike other dataset collection strategies, ARnnotate first guides a user to manipulate a virtual bounding box and records its poses and the user’s hand joint positions as the labels. By leveraging the spatial awareness of AR, the user manipulates the corresponding physical object while following the in-situ AR animation of the bounding box and hand model, while ARnnotate captures the user’s first-person view as the images of the dataset. A 12-participant user study was conducted, and the results proved the system’s usability in terms of the spatial accuracy of the labels, the satisfactory performance of the deep neural networks trained with the data collected by ARnnotate, and the users’ subjective feedback.
As the population ages, many will acquire visual impairments. To improve design for these users, it is essential to build awareness of their perspective during everyday routines, especially for design students.
Although several visual impairment simulation toolkits exist in both academia and as commercial products, analog, and static visual impairment simulation tools do not simulate effects concerning the user’s eye movements. Meanwhile, VR and video see-through-based AR simulation methods are constrained by smaller fields of view when compared with the natural human visual field and also suffer from vergence-accommodation conflict (VAC) which correlates with visual fatigue, headache, and dizziness.
In this paper, we enable an on-the-go, VAC-free, visually impaired experience by leveraging our optical see-through glasses. The FOV of our glasses is approximately 160 degrees for horizontal and 140 degrees for vertical, and participants can experience both losses of central vision and loss of peripheral vision at different severities. Our evaluation (n =14) indicates that the glasses can significantly and effectively reduce visual acuity and visual field without causing typical motion sickness symptoms such as headaches and or visual fatigue. Questionnaires and qualitative feedback also showed how the glasses helped to increase participants’ awareness of visual impairment.
Authors make their videos visually accessible by adding audio descriptions (AD), and auditorily accessible by adding closed captions (CC). However, creating AD and CC is challenging and tedious, especially for non-professional describers and captioners, due to the difficulty of identifying accessibility problems in videos. A video author will have to watch the video through and manually check for inaccessible information frame-by-frame, for both visual and auditory modalities. In this paper, we present CrossA11y, a system that helps authors efficiently detect and address visual and auditory accessibility issues in videos. Using cross-modal grounding analysis, CrossA11y automatically measures accessibility of visual and audio segments in a video by checking for modality asymmetries. CrossA11y then displays these segments and surfaces visual and audio accessibility issues in a unified interface, making it intuitive to locate, review, script AD/CC in-place, and preview the described and captioned video immediately. We demonstrate the effectiveness of CrossA11y through a lab study with 11 participants, comparing to existing baseline.
Sighted programmers often rely on visual cues (e.g., syntax coloring, keyword highlighting, code formatting) to perform common coding activities in text-based languages (e.g., Python). Unfortunately, blind and low-vision (BLV) programmers hardly benefit from these visual cues because they interact with computers via assistive technologies (e.g., screen readers), which fail to communicate visual semantics meaningfully. Prior work on making text-based programming languages and environments accessible mostly focused on code navigation and, to some extent, code debugging, but not much toward code editing, which is an essential coding activity.
We present Grid-Coding to fill this gap. Grid-Coding renders source code in a structured 2D grid, where each row, column, and cell have consistent, meaningful semantics. Its design is grounded on prior work and refined by 28 BLV programmers through online participatory sessions for 2 months. We implemented the Grid-Coding prototype as a spreadsheet-like web application for Python and evaluated it with a study with 12 BLV programmers. This study revealed that, compared to a text editor (i.e., the go-to editor for BLV programmers), our prototype enabled BLV programmers to navigate source code quickly, find the context of a statement easily, detect syntax errors in existing code effectively, and write new code with fewer syntax errors. The study also revealed how BLV programmers adopted Grid-Coding and demonstrated novel interaction patterns conducive to increased programming productivity.
We examine accessible interactions for wheelchair users and public displays with three studies. In a first study, we conduct a Systematic Literature Review, from which we report very few scientific papers on this topic and a preponderant focus on touch input. In a second study, we conduct a Systematic Video Review using YouTube as a data source, and unveil accessibility challenges for public displays and several input modalities alternative to direct touch. In a third study, we conduct semi-structured interviews with eleven wheelchair users to understand their experience interacting with public displays and to collect their preferences for more accessible input modalities. Based on our findings, we propose the “assisted interaction” phase to extend Vogel and Balakrishnan’s four-phase interaction model with public displays, and the “ability” dimension for cross-device interaction design to support, via users’ personal mobile devices, independent use of interactive public displays.
We present the first toolkit that equips blind and visually impaired (BVI) developers with the tools to create accessible data displays. Called PSST (Physical computing Streaming Sensor data Toolkit), it enables BVI developers to understand the data generated by sensors from a mouse to a micro:bit physical computing platform. By assuming visual abilities, earlier efforts to make physical computing accessible fail to address the need for BVI developers to access sensor data. PSST enables BVI developers to understand real-time, real-world sensor data by providing control over what should be displayed, as well as when to display and how to display sensor data. PSST supports filtering based on raw or calculated values, highlighting, and transformation of data. Output formats include tonal sonification, nonspeech audio files, speech, and SVGs for laser cutting. We validate PSST through a series of demonstrations and a user study with BVI developers.
We present TangibleGrid, a novel device that allows blind users to understand and design the layout of a web page with real-time tangible feedback. We conducted semi-structured interviews and a series of co-design sessions with blind users to elicit insights that guided the design of TangibleGrid. Our final prototype contains shape-changing brackets representing the web elements and a baseboard representing the web page canvas. Blind users can design a web page layout through creating and editing web elements by snapping or adjusting tangible brackets on top of the baseboard. The baseboard senses the brackets’ type, size, and location, verbalizes the information, and renders the web page on the client browser. Through a formative user study, we found that blind users could understand a web page layout through TangibleGrid. They were also able to design a new web layout from scratch without the help of sighted people.
Adaptive user interfaces can improve experiences in Extended Reality (XR) applications by adapting interface elements according to the user’s context. Although extensive work explores different adaptation policies, XR creators often struggle with their implementation, which involves laborious manual scripting. The few available tools are underdeveloped for realistic XR settings where it is often necessary to consider conflicting aspects that affect an adaptation. We fill this gap by presenting AUIT, a toolkit that facilitates the design of optimization-based adaptation policies. AUIT allows creators to flexibly combine policies that address common objectives in XR applications, such as element reachability, visibility, and consistency. Instead of using rules or scripts, specifying adaptation policies via adaptation objectives simplifies the design process and enables creative exploration of adaptations. After creators decide which adaptation objectives to use, a multi-objective solver finds appropriate adaptations in real-time. A study showed that AUIT allowed creators of XR applications to quickly and easily create high-quality adaptations.
Research has enabled virtual reality (VR) users to interact with the physical world by blending the physical world view into the virtual environment. However, current solutions are designed for specific use cases and hence are not capable of covering users’ varying needs for accessing information about the physical world. This work presents RealityLens, a user interface that allows users to peep into the physical world in VR with the reality lenses they deployed for their needs. For this purpose, we first conducted a preliminary study with experienced VR users to identify users’ needs for interacting with the physical world, which led to a set of features for customizing the scale, placement, and activation method of a reality lens. We evaluated the design in a user study (n=12) and collected the feedback of participants engaged in two VR applications while encountering a range of interventions from the physical world. The results show that users’ VR presence tends to be better preserved when interacting with the physical world with the support of the RealityLens interface.
Augmented Reality (AR), which blends physical and virtual worlds, presents the possibility of enhancing traditional toy design. By leveraging bidirectional virtual-physical interactions between humans and the designed artifact, such AR-enhanced toys can provide more playful and interactive experiences for traditional toys. However, designers are constrained by the complexity and technical difficulties of the current AR content creation processes. We propose MechARspace, an immersive authoring system that supports users to create toy-AR interactions through direct manipulation and visual programming. Based on the elicitation study, we propose a bidirectional interaction model which maps both ways: from the toy inputs to reactions of AR content, and also from the AR content to the toy reactions. This model guides the design of our system which includes a plug-and-play hardware toolkit and an in-situ authoring interface. We present multiple use cases enabled by MechARspace to validate this interaction model. Finally, we evaluate our system with a two-session user study where users first recreated a set of predefined toy-AR interactions and then implemented their own AR-enhanced toy designs.
User studies play a critical role in human subject research, including human-computer interaction. Virtual reality (VR) researchers tend to conduct user studies in-person at their laboratory, where participants experiment with novel equipment to complete tasks in a simulated environment, which is often new to many. However, due to social distancing requirements in recent years, VR research has been disrupted by preventing participants from attending in-person laboratory studies. On the other hand, affordable head-mounted displays are becoming common, enabling access to VR experiences and interactions outside traditional research settings. Recent research has shown that unsupervised remote user studies can yield reliable results, however, the setup of experiment software designed for remote studies can be technically complex and convoluted. We present a novel open-source Unity toolkit, RemoteLab, designed to facilitate the preparation of remote experiments by providing a set of tools that synchronize experiment state across multiple computers, record and collect data from various multimedia sources, and replay the accumulated data for analysis. This toolkit facilitates VR researchers to conduct remote experiments when in-person experiments are not feasible or increase the sampling variety of a target population and reach participants that otherwise would not be able to attend in-person.
Widely-accepted sleep guidelines advise regular bedtimes and sleep hygiene. An individual’s adherence is often viewed as a matter of self-regulation and anti-procrastination. We pose a question from a different perspective: What if it comes to a matter of one’s social or professional duty that mandates irregular daily life, making it incompatible with the premise of standard guidelines? We propose SleepGuru, an individually actionable sleep planning system featuring one’s real-life compatibility and extended forecast. Adopting theories on sleep physiology, SleepGuru builds a personalized predictor on the progression of the user’s sleep pressure over a course of upcoming schedules and past activities sourced from her online calendar and wearable fitness tracker. Then, SleepGuru service provides individually actionable multi-day sleep schedules which respect the user’s inevitable real-life irregularities while regulating her week-long sleep pressure. We elaborate on the underlying physiological principles and mathematical models, followed by a 3-stage study and deployment. We develop a mobile user interface providing individual predictions and adjustability backed by cloud-side optimization. We deploy SleepGuru in-the-wild to 20 users for 8 weeks, where we found positive effects of SleepGuru in sleep quality, compliance rate, sleep efficiency, alertness, long-term followability, and so on.
Haptic feedback not only enhances immersion in virtual reality (VR) but also delivers experts’ haptic sensation tips in VR training, e.g., properly clamping a tenon and mortise joint or tightening a screw in the assembly of VR factory training, which could even improve the training performance. However, various and complicated manipulation is in different scenarios. Although haptic feedback of virtual objects’ shape, stiffness or resistive force in pressing or grasping is achieved by previous research, rotational resistive force when twisting or turning virtual objects is seldom discussed or explored, especially for a wearable device. Therefore, we propose a wearable device, ELAXO, to integrate continuous resistive force and continuous rotational resistive force with or without resilience in grasping and twisting, respectively. ELAXO is an exoskeleton with rings, mechanical brakes and elastic bands. The brakes achieve shape rendering and switch between with and without resilience modes for the resistive force. The detachable and rotatable rings and elastic bands render continuous resistive force in grasping and twisting. We conducted a just noticeable difference (JND) study to understand users’ distinguishability in the four conditions, resistive force and rotational resistive force with and without resilience, separately. A VR study was then performed to verify that the versatile resistive force feedback from ELAXO enhances the VR experiences.
When playing scales on the piano, playing all notes evenly is a basic technique to improve the quality of music. However, it is difficult for beginners to do this because they need to achieve appropriate muscle synergies of the forearm and shoulder muscles, i.e., pressing keys as well as sliding their hands sideways. In this paper, we propose a system using electrical muscle stimulation (EMS) to teach beginners how to improve their muscle synergies while playing scales. We focus on “thumb-under” method and assist with it by applying EMS to the deltoid muscle. We conducted a user study to investigate whether our EMS-based system can help beginners learn new muscle synergies in playing ascending scales. We divided the participants into two groups: an experimental group that practiced with EMS and a control group that practiced without EMS. The results showed that practicing with EMS was more effective in improving the evenness of scales than without EMS and that the muscle synergies changed after practicing.
We study phrase-gesture typing, a gesture typing method that allows users to type short phrases by swiping through all the letters of the words in a phrase using a single, continuous gesture. Unlike word-gesture typing, where text needs to be entered word by word, phrase-gesture typing enters text phrase by phrase. To demonstrate the usability of phrase-gesture typing, we implemented a prototype called PhraseSwipe. Our system is composed of a frontend interface designed specifically for typing through phrases and a backend phrase-level gesture decoder developed based on a transformer-based neural language model. Our decoder was trained using five million phrases of varying lengths of up to five words, chosen randomly from the Yelp Review Dataset. Through a user study with 12 participants, we demonstrate that participants could type using PhraseSwipe at an average speed of 34.5 WPM with a Word Error Rate of 1.1%.
Real-time tracking of a user’s hands, arms and environment is valuable in a wide variety of HCI applications, from context awareness to virtual reality. Rather than rely on fixed and external tracking infrastructure, the most flexible and consumer-friendly approaches are mobile, self-contained, and compatible with popular device form factors (e.g., smartwatches). In this vein, we contribute DiscoBand, a thin sensing strap not exceeding 1 cm in thickness. Sensors operating so close to the skin inherently face issues with occlusion. To help overcome this, our strap uses eight distributed depth sensors imaging the hand from different viewpoints, creating a sparse 3D point cloud. An additional eight depth sensors image outwards from the band to track the user’s body and surroundings. In addition to evaluating arm and hand pose tracking, we also describe a series of supplemental applications powered by our band’s data, including held object recognition and environment mapping.
We present DeltaPen, a pen device that operates on passive surfaces without the need for external tracking systems or active sensing surfaces. DeltaPen integrates two adjacent lens-less optical flow sensors at its tip, from which it reconstructs accurate directional motion as well as yaw rotation. DeltaPen also supports tilt interaction using a built-in inertial sensor. A pressure sensor and high-fidelity haptic actuator complements our pen device while retaining a compact form factor that supports mobile use on uninstrumented surfaces. We present a processing pipeline that reliably extracts fine-grained pen translations and rotations from the two optical flow sensors. To asses the accuracy of our translation and angle estimation pipeline, we conducted a technical evaluation in which we compared our approach with ground-truth measurements of participants’ pen movements during typical pen interactions. We conclude with several example applications that leverage our device’s capabilities. Taken together, we demonstrate novel input dimensions with DeltaPen that have so far only existed in systems that require active sensing surfaces or external tracking.
EtherPose is a continuous hand pose tracking system employing two wrist-worn antennas, from which we measure the real-time dielectric loading resulting from different hand geometries (i.e., poses). Unlike worn camera-based methods, our RF approach is more robust to occlusion from clothing and avoids capturing potentially sensitive imagery. Through a series of simulations and empirical studies, we designed a proof-of-concept, worn implementation built around compact vector network analyzers. Sensor data is then interpreted by a machine learning backend, which outputs a fully-posed 3D hand. In a user study, we show how our system can track hand pose with a mean Euclidean joint error of 11.6 mm, even when covered in fabric. We also studied 2DOF wrist angle and micro-gesture tracking. In the future, our approach could be miniaturized and extended to include more and different types of antennas, operating at different self resonances.
We engineered DigituSync, a passive-exoskeleton that physically links two hands together, enabling two users to adaptively transmit finger movements in real-time. It uses multiple four-bar linkages to transfer both motion and force, while still preserving congruent haptic feedback. Moreover, we implemented a variable-length linkage that allows adjusting the force transmission ratio between the two users and regulates the amount of intervention, which enables users to customize their learning experience. DigituSync's benefits emerge from its passive design: unlike existing haptic devices (motor-based exoskeletons or electrical muscle stimulation), DigituSync has virtually no latency and does not require batteries/electronics to transmit or adjust movements, making it useful and safe to deploy in many settings, such as between students and teachers in a classroom. We validated DigituSync by means of technical evaluations and a user study, demonstrating that it instantly transfers finger motions and forces with the ability of adaptive force transmission, which allowed participants to feel more control over their own movements and to feel the teacher's intervention was more responsive. We also conducted two exploratory sessions with a music teacher and deaf-blind users, which allowed us to gather experiential insights from the teacher's side and explore DigituSync in applications.
Acoustic levitation has emerged as a promising approach for mid-air displays, by using multiple levitated particles as 3D voxels, cloth and thread props, or high-speed tracer particles, under the promise of creating 3D displays that users can see, hear and feel with their bare eyes, ears and hands. However, interaction with this mid-air content always occurred at a distance, since external objects in the display volume (e.g. user’s hands) can disturb the acoustic fields and make the particles fall. This paper proposes TipTrap, a co-located direct manipulation technique for acoustically levitated particles. TipTrap leverages the reflection of ultrasound on the users’ skin and employs a closed-loop system to create functional acoustic traps 2.1 mm below the fingertips, and addresses its 3 basic stages: selection, manipulation and deselection. We use Finite-Differences Time Domain (FDTD) simulations to explain the principles enabling TipTrap, and explore how finger reflections and user strategies influence the quality of the traps (e.g. approaching direction, orientation and tracking errors), and use these results to design our technique. We then implement the technique, characterizing its performance with a robotic hand setup and finish with an exploration of the ability of TipTrap to manipulate different types of levitated content.
Developers spend significant amounts of time finding, relating, navigating, and, more broadly, making sense of code. While sensemaking, developers must keep track of many pieces of information including the objectives of their task, the code locations of interest, their questions and hypotheses about the behavior of the code, and more. Despite this process being such an integral aspect of software development, there is little tooling support for externalizing and keeping track of developers’ information, which led us to develop Catseye – an annotation tool for lightweight notetaking about code. Catseye has advantages over traditional methods of externalizing code-related information, such as commenting, in that the annotations retain the original context of the code while not actually modifying the underlying source code, they can support richer interactions such as lightweight versioning, and they can be used as navigational aids. In our investigation of developers’ notetaking processes using Catseye, we found developers were able to successfully use annotations to support their code sensemaking when completing a debugging task.
We articulate a vision for computer programming that includes pen-based computing, a paradigm we term notational programming. Notational programming blurs contexts: certain typewritten variables can be referenced in handwritten notation and vice-versa. To illustrate this paradigm, we developed an extension, Notate, to computational notebooks which allows users to open drawing canvases within lines of code. As a case study, we explore quantum programming and designed a notation, Qaw, that extends quantum circuit notation with abstraction features, such as variable-sized wire bundles and recursion. Results from a usability study with novices suggest that users find our core interaction of implicit cross-context references intuitive, but suggests further improvements to debugging infrastructure, interface design, and recognition rates. Throughout, we discuss questions raised by the notational paradigm, including a shift from ‘recognition’ of notations to ‘reconfiguration’ of practices and values around programming, and from ‘sketching’ to writing and drawing, or what we call ‘notating.’
Data scientists, researchers, and clerks often create web automation programs to perform repetitive yet essential tasks, such as data scraping and data entry. However, existing web automation systems lack mechanisms for defining conditional behaviors where the system can intelligently filter candidate content based on semantic filters (e.g., extract texts based on key ideas or images based on entity relationships). We introduce SemanticOn, a system that enables users to specify, refine, and incorporate visual and textual semantic conditions in web automation programs via two methods: natural language description via prompts or information highlighting. Users can coordinate with SemanticOn to refine the conditions as the program continuously executes or reclaim manual control to repair errors. In a user study, participants completed a series of conditional web automation tasks. They reported that SemanticOn helped them effectively express and refine their semantic intent by utilizing visual and textual conditions.
Modern program synthesizers are increasingly delivering on their promise of lightening the burden of programming by automatically generating code, but little research has addressed how we can make such systems learnable to all. In this work, we ask: What aspects of program synthesizers contribute to and detract from their learnability by novice programmers? We conducted a thematic analysis of 22 observations of novice programmers, during which novices worked with existing program synthesizers, then participated in semi-structured interviews. Our findings shed light on how their specific points in the synthesizer design space affect these tools’ learnability by novice programmers, including the type of specification the synthesizer requires, the method of invoking synthesis and receiving feedback, and the size of the specification. We also describe common misconceptions about what constitutes meaningful progress and useful specifications for the synthesizers, as well as participants’ common behaviors and strategies for using these tools. From this analysis, we offer a set of design opportunities to inform the design of future program synthesizers that strive to be learnable by novice programmers. This work serves as a first step toward understanding how we can make program synthesizers more learnable by novices, which opens up the possibility of using program synthesizers in educational settings as well as developer tooling oriented toward novice programmers.
Programmers often rely on online resources—such as code examples, documentation, blogs, and Q&A forums—to compare similar libraries and select the one most suitable for their own tasks and contexts. However, this comparison task is often done in an ad-hoc manner, which may result in suboptimal choices. Inspired by Analogical Learning and Variation Theory, we hypothesize that rendering many concept-annotated code examples from different libraries side-by-side can help programmers (1) develop a more comprehensive understanding of the libraries’ similarities and distinctions and (2) make more robust, appropriate library selections. We designed a novel interactive interface, ParaLib, and used it as a technical probe to explore to what extent many side-by-side concepted-annotated examples can facilitate the library comparison and selection process. A within-subjects user study with 20 programmers shows that, when using ParaLib, participants made more consistent, suitable library selections and provided more comprehensive summaries of libraries’ similarities and differences.
We present FLEX-SDK: an open-source software development kit that allows creating a social robot from two simple tablet screens. FLEX-SDK involves tools for designing the robot face and its facial expressions, creating screens for input/output interactions, controlling the robot through a Wizard-of-Oz interface, and scripting autonomous interactions through a simple text-based programming interface. We demonstrate how this system can be used to replicate an interaction study and we present nine case studies involving controlled experiments, observational studies, participatory design sessions, and outreach activities in which our tools were used by researchers and participants to create and interact with social robots. We discuss common observations and lessons learned from these case studies. Our work demonstrates the potential of FLEX-SDK to lower the barrier to entry for Human-Robot Interaction research.
We present a novel design for materials that are reconfigurable by end-users. Conceptually, we propose decomposing such reconfigurable materials into (1) a generic, complex material consisting of engineered microstructures (known as metamaterials) designed to be purchased and (2) a simple configuration geometry that can be fabricated by end-users to fit their individual use cases. Specifically, in this paper we investigate reconfiguring our material’s elasticity, such that it can cover existing objects and thereby augment their material properties. Users can configure their materials by generating the configuration geometry using our interactive editor, 3D printing it using commonly available filaments (e. g., PLA), and pressing it onto the generic material for local coupling. We characterize the mechanical properties of our reconfigurable elastic metamaterial and showcase the material’s applicability as, e.g., augmentation for haptic props in virtual reality, a reconfigurable shoe sole for different activities, or a battleship-like ball game.
With spaceR, we present both design and implementation of a resistive force-sensor based on a spacer fabric knit. Due to its softness and elasticity, our sensor provides an appealing haptic experience. It enables continuous input with high precision due to its innate haptic feedback and can be manufactured ready-made on a regular two-bed weft knitting machine, without requiring further post-processing steps. For our multi-component knit, we add resistive yarn to the filler material, in order to achieve a highly sensitive and responsive pressure sensing textile. Sensor resistance drops by ~90% when actuated with moderate finger pressure of 2 N, making the sensor accessible also for straightforward readout electronics. We discuss related manufacturing parameters and their effect on shape and electrical characteristics and explore design opportunities to harness visual and tactile affordances. Finally, we demonstrate several application scenarios by implementing diverse spaceR variations, including analog rocker- and four-way directional buttons, and show the possibility of mode-switching by tracking temporal data.
We present Kinergy—an interactive design tool for creating self-propelled motion by harnessing the energy stored in 3D printable springs. To produce controllable output motions, we introduce 3D printable kinetic units, a set of parameterizable designs that encapsulate 3D printable springs, compliant locks, and transmission mechanisms for three non-periodic motions—instant translation, instant rotation, continuous translation—and four periodic motions—continuous rotation, reciprocation, oscillation, intermittent rotation. Kinergy allows the user to create motion-enabled 3D models by embedding kinetic units, customize output motion characteristics by parameterizing embedded springs and kinematic elements, control energy by operating the specialized lock, and preview the resulting motion in an interactive environment. We demonstrate the potential of our techniques via example applications from spring-loaded cars to kinetic sculptures and close with a discussion of key challenges such as geometric constraints.
As touch interactions become ubiquitous in the field of human computer interactions, it is critical to enrich haptic feedback to improve efficiency, accuracy, and immersive experiences. This paper presents HapTag, a thin and flexible actuator to support the integration of push button tactile renderings to daily soft surfaces. Specifically, HapTag works under the principle of hydraulically amplified electroactive actuator (HASEL) while being optimized by embedding a pressure sensing layer, and being activated with a dedicated voltage appliance in response to users’ input actions, resulting in fast response time, controllable and expressive push-button tactile rendering capabilities. HapTag is in a compact formfactor and can be attached, integrated, or embedded on various soft surfaces like cloth, leather, and rubber. Three common push button tactile patterns were adopted and implemented with HapTag. We validated the feasibility and expressiveness of HapTag by demonstrating a series of innovative applications under different circumstances.
Pin-based shape-changing displays can present dynamic shape changes by actuating a number of pins. However, the use of many linear actuators to achieve this makes the electrical structure and mechanical construction of the display complicated. We propose a simple pin-based shape-changing display that outputs shape and motions without any electronic elements. Our display consists of magnetic pins in a pin housing, with a magnetic sheet underneath it. The magnetic sheet has a specific magnetic pattern on its surface, and each magnetic pin has a magnet at its lower end. The repulsive force generated between the magnetic sheet and the magnetic pin levitates the pin vertically, and the height of the pin-top varies depending on the magnetic pattern. This paper introduces the basic structure of the display and compares several fabrication methods for the magnetic pins, to highlight the applicability of this method. We have also demonstrated some applications and discussed future possibilities.
Humans can estimate the properties of wielded objects (e.g., inertia and viscosity) using the force applied to the hand. We focused on this mechanism and aimed to represent the properties of wielded objects by dynamically changing the force applied to the hand. We propose MetamorphX, which uses control moment gyroscopes (CMGs) to generate ungrounded, 3-degrees of freedom moment feedback. The high-response moments obtained CMGs allow the inertia and viscosity of motion to be set to the desired values via impedance control. A technical evaluation indicated that our device can generate a moment with a 60-ms delay. The inertia and viscosity of motion were varied by 0.01 kgm2 and 0.1 Ns, respectively. Additionally, we demonstrated that our device can dynamically change the inertia and viscosity of motion through virtual reality applications.
Advances in multimodal AI have presented people with powerful ways to create images from text. Recent work has shown that text-to-image generations are able to represent a broad range of subjects and artistic styles. However, finding the right visual language for text prompts is difficult. In this paper, we address this challenge with Opal, a system that produces text-to-image generations for news illustration. Given an article, Opal guides users through a structured search for visual concepts and provides a pipeline allowing users to generate illustrations based on an article’s tone, keywords, and related artistic styles. Our evaluation shows that Opal efficiently generates diverse sets of news illustrations, visual assets, and concept ideas. Users with Opal generated two times more usable results than users without. We discuss how structured exploration can help users better understand the capabilities of human AI co-creative systems.
Social computing prototypes probe the social behaviors that may arise in an envisioned system design. This prototyping practice is currently limited to recruiting small groups of people. Unfortunately, many challenges do not arise until a system is populated at a larger scale. Can a designer understand how a social system might behave when populated, and make adjustments to the design before the system falls prey to such challenges? We introduce social simulacra, a prototyping technique that generates a breadth of realistic social interactions that may emerge when a social computing system is populated. Social simulacra take as input the designer’s description of a community’s design—goal, rules, and member personas—and produce as output an instance of that design with simulated behavior, including posts, replies, and anti-social behaviors. We demonstrate that social simulacra shift the behaviors that they generate appropriately in response to design changes, and that they enable exploration of “what if?” scenarios where community members or moderators intervene. To power social simulacra, we contribute techniques for prompting a large language model to generate thousands of distinct community members and their social interactions with each other; these techniques are enabled by the observation that large language models’ training data already includes a wide variety of positive and negative behavior on social media platforms. In evaluations, we show that participants are often unable to distinguish social simulacra from actual community behavior and that social computing designers successfully refine their social computing designs when using social simulacra.
Generative Adversarial Network (GAN) is widely adopted in numerous application areas, such as data preprocessing, image editing, and creativity support. However, GAN’s ‘black box’ nature prevents non-expert users from controlling what data a model generates, spawning a plethora of prior work that focused on algorithm-driven approaches to extract editing directions to control GAN. Complementarily, we propose a GANzilla—a user-driven tool that empowers a user with the classic scatter/gather technique to iteratively discover directions to meet their editing goals. In a study with 12 participants, GANzilla users were able to discover directions that (i) edited images to match provided examples (closed-ended tasks) and that (ii) met a high-level goal, e.g., making the face happier, while showing diversity across individuals (open-ended tasks).
We present a communication support system, namely We-toon, that can bridge the webtoon writers and artists during sketch revision (i.e., character design and draft revision). In the highly iterative design process between the webtoon writers and artists, writers often have difficulties in precisely articulating their feedback on sketches owing to their lack of drawing proficiency. This drawback makes the writers rely on textual descriptions and reference images found using search engines, leading to indirect and inefficient communications. Inspired by a formative study, we designed We-toon to help writers revise webtoon sketches and effectively communicate with artists. Through a GAN-based image synthesis and manipulation, We-toon can interactively generate diverse reference images and synthesize them locally on any user-provided image. Our user study with 24 professional webtoon authors demonstrated that We-toon outperforms the traditional methods in terms of communication effectiveness and the writers’ satisfaction level related to the revised image.
Many design tasks involve parameter adjustment, and designers often struggle to find desirable parameter value combinations by manipulating sliders back and forth. For such a multi-dimensional search problem, Bayesian optimization (BO) is a promising technique because of its intelligent sampling strategy; in each iteration, BO samples the most effective points considering both exploration (i.e., prioritizing unexplored regions) and exploitation (i.e., prioritizing promising regions), enabling efficient searches. However, existing BO-based design frameworks take the initiative in the design process and thus are not flexible enough for designers to freely explore the design space using their domain knowledge. In this paper, we propose a novel design framework, BO as Assistant, which enables designers to take the initiative in the design process while also benefiting from BO’s sampling strategy. The designer can manipulate sliders as usual; the system monitors the slider manipulation to automatically estimate the design goal on the fly and then asynchronously provides unexplored-yet-promising suggestions using BO’s sampling strategy. The designer can choose to use the suggestions at any time. This framework uses a novel technique to automatically extract the necessary information to run BO by observing slider manipulation without requesting additional inputs. Our framework is domain-agnostic, demonstrated by applying it to photo color enhancement, 3D shape design for personal fabrication, and procedural material design in computer graphics.
Consensus-building is an essential process for the success of co-design projects. To build consensus, stakeholders need to discuss conflicting needs and viewpoints, converge their ideas toward shared interests, and grow their willingness to commit to group decisions. However, managing group discussions is challenging in large co-design projects with multiple stakeholders. In this paper, we investigate the interaction design of a chatbot that can mediate consensus-building conversationally. By interacting with individual stakeholders, the chatbot collects ideas to satisfy conflicting needs and engages stakeholders to consider others’ viewpoints, without having stakeholders directly interact with each other. Results from an empirical study in an educational setting (N = 12) suggest that the approach can increase stakeholders’ commitment to group decisions and maintain the effect even on the group decisions that conflict with personal interests. We conclude that chatbots can facilitate consensus-building in small-to-medium-sized projects, but more work is needed to scale up to larger projects.
When users use Virtual Reality (VR) in nontraditional postures, such as while reclining or lying in relaxed positions, their views lean upwards and need to be corrected, to make sure they see upright contents and perceive the interactions as if they were standing. Such upright redirection is excepted to cause visual-vestibular-proprioceptive conflict, affecting users’ internal perceptions (e.g., body ownership, presence, simulator sickness) and external perceptions (e.g., egocentric space perception) in VR. Different body reclining angles may affect vestibular sensitivity and lead to the dynamic weighting of multi-sensory signals in the sensory integration. In the paper, we investigated the impact of upright redirection on users’ perceptions, with users’ physical bodies tilted at various angles backward and views upright redirected accordingly. The results showed that upright redirection led to simulator sickness, confused self-awareness, weak upright illusion, and increased space perception deviations to various extents when users are at different reclining positions, and the situations were the worst at the 45° conditions. Based on these results, we designed some illusion-based and sensory-based methods, that were shown effective in reducing the impact of sensory conflict through preliminary evaluations.
Despite significant improvements to Virtual Reality (VR) technologies, most VR displays are fixed focus and depth perception is still a key issue that limits the user experience and the interaction performance. To supplement humans’ inherent depth cues (e.g., retinal blur, motion parallax), we investigate users’ perceptual mappings of distance to virtual objects’ appearance to generate visual cues aimed to enhance depth perception. As a first step, we explore color-to-depth mappings for virtual objects so that their appearance differs in saturation and value to reflect their distance. Through a series of controlled experiments, we elicit and analyze users’ strategies of mapping a virtual object’s hue, saturation, value and a combination of saturation and value to its depth. Based on the collected data, we implement a computational model that generates color-to-depth mappings fulfilling adjustable requirements on confusion probability, number of depth levels, and consistent saturation/value changing tendency. We demonstrate the effectiveness of color-to-depth mappings in a 3D sketching task, showing that compared to single-colored targets and strokes, with our mappings, the users were more confident in the accuracy without extra cognitive load and reduced the perceived depth error by 60.8%. We also implement four VR applications and demonstrate how our color cues can benefit the user experience and interaction performance in VR.
Augmented Reality has traditionally been used to display digital overlays in real environments. Many AR applications such as remote collaboration, picking tasks, or navigation require highlighting physical objects for selection or guidance. These highlights use graphical cues such as outlines and arrows. Whilst effective, they greatly contribute to visual clutter, possibly occlude scene elements, and can be problematic for long-term use. Substituting those overlays, we explore saliency modulation to accentuate objects in the real environment to guide the user’s gaze. Instead of manipulating video streams, like done in perception and cognition research, we investigate saliency modulation of the real world using optical-see-through head-mounted displays. This is a new challenge, since we do not have full control over the view of the real environment. In this work we provide our specific solution to this challenge, including built prototypes and their evaluation.
Despite the increasing popularity of VR games, one factor hindering the industry’s rapid growth is motion sickness experienced by the users. Symptoms such as fatigue and nausea severely hamper the user experience. Machine Learning methods could be used to automatically detect motion sickness in VR experiences, but generating the extensive labeled dataset needed is a challenging task. It needs either very time consuming manual labeling by human experts or modification of proprietary VR application source codes for label capturing. To overcome these challenges, we developed a novel data collection tool, VRhook, which can collect data from any VR game without needing access to its source code. This is achieved by dynamic hooking, where we can inject custom code into a game’s run-time memory to record each video frame and its associated transformation matrices. Using this, we can automatically extract various useful labels such as rotation, speed, and acceleration. In addition, VRhook can blend a customized screen overlay on top of game contents to collect self-reported comfort scores. In this paper, we describe the technical development of VRhook, demonstrate its utility with an example, and describe directions for future research.
We propose a novel system for low-cost multi-color Fused Filament Fabrication (FFF) 3D printing, allowing for the creation of customizable colored filament using a pre-processing approach. We developed an open-source device to automatically ink filament using permanent markers. Our device can be built using 3D printed parts and off-the-shelf electronics. An accompanying web-based interface allows users to view GCODE toolpaths for a multi-color print and quickly generate filament color profiles. Taking a pre-processing approach makes this system compatible with the majority of desktop 3D printers on the market, as the processed filament behaves no differently from conventional filaments. Furthermore, inked filaments can be produced economically, reducing the need for excessive purchasing of material to expand color options. We demonstrate the efficacy of our system by fabricating monochromatic objects, objects with gradient colors, objects with bi-directional properties, as well as multi-color objects with up to four colors in a single print.
We present interiqr, a method that utilizes the infill parameter in the 3D printing process to embed information inside the food that is difficult to recognize with the human eye. Our key idea is to utilize the air space or secondary materials to generate a specific pattern inside the food without changing the model geometry. As a result, our method exploits the patterns that appear as hidden edible tags to store the data and simultaneously adds them to a 3D printing pipeline. Our contribution also includes the framework that connects the user with a data-embedding interface through the food 3D printing process, and the decoding system allows the user to decode the information inside the 3D printed food through backlight illumination and a simple image processing technique. Finally, we evaluate the usability of our method under different settings and demonstrate our method through the example application scenarios.
In this research, we used traditional sequin embroidery as the basis and a 3D printer to expand the design space of sequin materials and structures, by developing a new 2.5D smart conductive sequin textile with multiple sensing and interactions as well as providing users with a customizing system for automated design and manufacturing.
Through 3D printing, we have developed a variety of 3D sequins. We used each sequin as an individual design unit to realize various circuit designs and sensing functions by adjusting the design primitives such as conductivity, shape, and arrangement. We also designed applications such as motion sensing of body movements, and posture detection of the ankle. In addition, we surveyed user requirements through user testing to optimize the design space.
This paper describes the design space, design software, automation, application, and user study of various smart sequin textiles.
The unique behaviors of thermoplastic polymers enable shape-changing interfaces made of 3D printed objects that do not require complex electronics integration. While existing techniques rely on external trigger, such as heat, applied globally on a 3D printed object initiating all at once the shape-changing response (e.g., hot water, heat gun, oven), independent control of multiple parts of the object becomes nearly impossible. We introduce ShrinkCells, a set of shape-changing actuators that enables localized heat to shrink or bend, through combining the properties of two materials — conductive PLA is used to generate localized heat which selectively triggers the shrinking of a Shape Memory Polymer. The unique benefit of ShrinkCells is their capability of triggering simultaneous or sequential shape transformations for different geometries using a single power supply. This results in 3D printed rigid structures that actuate in sequence, avoiding self-collisions when unfolding. We contribute to the body of literature on 4D fabrication by a systematic investigation of selective heating with two different materials, the design and evaluation of the ShrinkCells shape-changing primitives, and applications demonstrating the usage of these actuators.
Bayesian hierarchical models are probabilistic models that have hierarchical structures and use Bayesian methods for inferences. In this paper, we extend Fitts’ law to be a Bayesian hierarchical pointing model and compare it with the typical pooled pointing models (i.e., treating all observations as the same pool), and the individual pointing models (i.e., building an individual model for each user separately). The Bayesian hierarchical pointing models outperform pooled and individual pointing models in predicting the distribution and the mean of pointing movement time, especially when the training data are sparse. Our investigation also shows that both noninformative and weakly informative priors are adequate for modeling pointing actions, although the weakly informative prior performs slightly better than the noninformative prior when the training data size is small. Overall, we conclude that the expected advantages of Bayesian hierarchical models hold for the pointing tasks. Bayesian hierarchical modeling should be adopted a more principled and effective approach of building pointing models than the current common practices in HCI which use pooled or individual models.
The accurate and personalized estimation of task difficulty provides many opportunities for optimizing user experience. However, user diversity makes such difficulty estimation hard, in that empirical measurements from some user sample do not necessarily generalize to others.
In this paper, we contribute a new approach for personalized difficulty estimation of game levels, borrowing methods from content recommendation. Using factorization machines (FM) on a large dataset from a commercial puzzle game, we are able to predict difficulty as the number of attempts a player requires to pass future game levels, based on observed attempt counts from earlier levels and levels played by others. In addition to performance and scalability, FMs offer the benefit that the learned latent variable model can be used to study the characteristics of both players and game levels that contribute to difficulty. We compare the approach to a simple non-personalized baseline and a personalized prediction using Random Forests. Our results suggest that FMs are a promising tool enabling game designers to both optimize player experience and learn more about their players and the game.
There is a growing interest in adopting Deep Learning (DL) given its superior performance in many domains. However, modern DL frameworks such as TensorFlow often come with a steep learning curve. In this work, we propose INTENT, an interactive system that infers user intent and generates corresponding TensorFlow code on behalf of users. INTENT helps users understand and validate the semantics of generated code by rendering individual tensor transformation steps with intermediate results and element-wise data provenance. Users can further guide INTENT by marking certain TensorFlow operators as desired or undesired, or directly manipulating the generated code. A within-subjects user study with 18 participants shows that users can finish programming tasks in TensorFlow more successfully with only half the time, compared with a variant of INTENT that has no interaction or visualization support.
Forward biomechanical simulation in HCI holds great promise as a tool for evaluation, design, and engineering of user interfaces. Although reinforcement learning (RL) has been used to simulate biomechanics in interaction, prior work has relied on unrealistic assumptions about the control problem involved, which limits the plausibility of emerging policies. These assumptions include direct torque actuation as opposed to muscle-based control; direct, privileged access to the external environment, instead of imperfect sensory observations; and lack of interaction with physical input devices. In this paper, we present a new approach for learning muscle-actuated control policies based on perceptual feedback in interaction tasks with physical input devices. This allows modelling of more realistic interaction tasks with cognitively plausible visuomotor control. We show that our simulated user model successfully learns a variety of tasks representing different interaction methods, and that the model exhibits characteristic movement regularities observed in studies of pointing. We provide an open-source implementation which can be extended with further biomechanical models, perception models, and interactive environments.
Interactions based on automatic speech recognition (ASR) have become widely used, with speech input being increasingly utilized to create documents. However, as there is no easy way to distinguish between commands being issued and text required to be input in speech, misrecognitions are difficult to identify and correct, meaning that documents need to be manually edited and corrected. The input of symbols and commands is also challenging because these may be misrecognized as text letters. To address these problems, this study proposes a speech interaction method called DualVoice, by which commands can be input in a whispered voice and letters in a normal voice. The proposed method does not require any specialized hardware other than a regular microphone, enabling a complete hands-free interaction. The method can be used in a wide range of situations where speech recognition is already available, ranging from text input to mobile/wearable computing. Two neural networks were designed in this study, one for discriminating normal speech from whispered speech, and the second for recognizing whisper speech. A prototype of a text input system was then developed to show how normal and whispered voice can be used in speech text input. Other potential applications using DualVoice are also discussed.
It is important for photographers to have the best possible lighting configuration at the time of shooting; otherwise, they need post-processing on images, which may cause artifacts and deterioration. Thus, photographers often struggle to find the best possible lighting configuration by manipulating lighting devices, including light sources and modifiers, in a trial-and-error manner. In this paper, we propose a novel computational framework to support photographers. This framework assumes that every lighting device is programmable; that is, its adjustable parameters (e.g., orientation, intensity, and color temperature) can be set using a program. Using our framework, photographers do not need to learn how the parameter values affect the resulting lighting, and even do not need to determine the strategy of the trial-and-error process; instead, photographers need only concentrate on evaluating which lighting configuration is more desirable among options suggested by the system. The framework is enabled by our novel photographer-in-the-loop Bayesian optimization, which is sample-efficient (i.e., the number of required evaluation steps is small) and which can also be guided by providing a rough painting of the desired lighting configuration if any. We demonstrate how the framework works in both simulated virtual environments and a physical environment, suggesting that it could find pleasing lighting configurations quickly in around 10 iterations. Our user study suggests that the framework enables the photographer to concentrate on the look of captured images rather than the parameters, compared with the traditional manual lighting workflow.
Web search is increasingly used to satisfy complex, exploratory information goals. Exploring and synthesizing information into knowledge can be slow and cognitively demanding due to a disconnect between search tools and sense-making workspaces. Our work explores how we might integrate contextual query suggestions within a person’s sensemaking environment. We developed InterWeave a prototype that leverages a human wizard to generate contextual search guidance and to place the suggestions within the emergent structure of a searchers’ notes. To investigate how weaving suggestions into the sensemaking workspace affects a user’s search and sensemaking behavior, we ran a between-subjects study (n=34) where we compare InterWeave’s in context placement with a conventional list of query suggestions. InterWeave’s approach not only promoted active searching, information gathering and knowledge discovery, but also helped participants keep track of new suggestions and connect newly discovered information to existing knowledge, in comparison to presenting suggestions as a separate list. These results point to directions for future work to interweave contextual and natural search guidance into everyday work.
Reviewing the literature to understand relevant threads of past work is a critical part of research and vehicle for learning. However, as the scientific literature grows the challenges for users to find and make sense of the many different threads of research grow as well. Previous work has helped scholars to find and group papers with citation information or textual similarity using standalone tools or overview visualizations. Instead, in this work we explore a tool integrated into users’ reading process that helps them with leveraging authors’ existing summarization of threads, typically in introduction or related work sections, in order to situate their own work’s contributions. To explore this we developed a prototype that supports efficient extraction and organization of threads along with supporting evidence as scientists read research articles. The system then recommends further relevant articles based on user-created threads. We evaluate the system in a lab study and find that it helps scientists to follow and curate research threads without breaking out of their flow of reading, collect relevant papers and clips, and discover interesting new articles to further grow threads.
Personal cloud storage systems increasingly offer recommendations to help users retrieve or manage files of interest. For example, Google Drive’s Quick Access predicts and surfaces files likely to be accessed. However, when multiple, related recommendations are made, interfaces typically present recommended files and any accompanying explanations individually, burdening users. To improve the usability of ML-driven personal information management systems, we propose a new method for summarizing related file-management recommendations. We generate succinct summaries of groups of related files being recommended. Summaries reference the files’ shared characteristics. Through a within-subjects online study in which participants received recommendations for groups of files in their own Google Drive, we compare our summaries to baselines like visualizing a decision tree model or simply listing the files in a group. Compared to the baselines, participants expressed greater understanding and confidence in accepting recommendations when shown our novel recommendation summaries.
Modern knowledge workers typically need to use multiple resources, such as documents, web pages, and applications, at the same time. This complexity in their computing environments forces workers to restore various resources in the course of their work. However, conventional curation methods like bookmarks, recent document histories, and file systems place limitations on effective retrieval. Such features typically work only for resources of one type within one application, ignoring the interdependency between resources needed for a single task. In addition, text-based handles do not provide rich cues for users to recognize their associated resources. Hence, the need to locate and reopen relevant resources can significantly hinder knowledge workers’ productivity. To address these issues, we designed and developed Scrapbook, a novel application for digital resource curation across applications that uses screenshot-based bookmarks. Scrapbook extracts and stores all the metadata (URL, file location, and application name) of windows visible in a captured screenshot to facilitate restoring them later. A week-long field study indicated that screenshot-based bookmarks helped participants curate digital resources. Additionally, participants reported that multimodal—visual and textual—data helped them recall past computer activities and reconstruct working contexts efficiently.
The vast scale and open-ended nature of knowledge graphs (KGs) make exploratory search over them cognitively demanding for users. We introduce a new technique, polymorphic lenses, that improves exploratory search over a KG by obtaining new leverage from the existing preference models that KG-based systems maintain for recommending content. The approach is based on a simple but powerful observation: in a KG, preference models can be re-targeted to recommend not only entities of a single base entity type (e.g., papers in the scientific literature KG, products in an e-commerce KG), but also all other types (e.g., authors, conferences, institutions; sellers, buyers). We implement our technique in a novel system, FeedLens, which is built over Semantic Scholar, a production system for navigating the scientific literature KG. FeedLens reuses the existing preference models on Semantic Scholar—people’s curated research feeds—as lenses for exploratory search. Semantic Scholar users can curate multiple feeds/lenses for different topics of interest, e.g., one for human-centered AI and another for document embeddings. Although these lenses are defined in terms of papers, FeedLens re-purposes them to also guide search over authors, institutions, venues, etc. Our system design is based on feedback from intended users via two pilot surveys (n = 17 and n = 13, respectively). We compare FeedLens and Semantic Scholar via a third (within-subjects) user study (n = 15) and find that FeedLens increases user engagement while reducing the cognitive effort required to complete a short literature review task. Our qualitative results also highlight people’s preference for this more effective exploratory search experience enabled by FeedLens.
We propose a text editor to help users plan, structure and reflect on their writing process. It provides continuously updated paragraph-wise summaries as margin annotations, using automatic text summarization. Summary levels range from full text, to selected (central) sentences, down to a collection of keywords. To understand how users interact with this system during writing, we conducted two user studies (N=4 and N=8) in which people wrote analytic essays about a given topic and article. As a key finding, the summaries gave users an external perspective on their writing and helped them to revise the content and scope of their drafted paragraphs. People further used the tool to quickly gain an overview of the text and developed strategies to integrate insights from the automated summaries. More broadly, this work explores and highlights the value of designing AI tools for writers, with Natural Language Processing (NLP) capabilities that go beyond direct text generation and correction.