Anatomical Considerations in Facial Motion Capture
Elizabeth A. Rega
Department of Anatomy
Western University of Heath Sciences
Stuart S. Sumida
Department of Biology
California State University of Heath Sciences
Facial performance capture has been enabled unquestionably by significant technological advances in three dimensional motion capture and modeling. Yet from a critical artistic perspective, the question remains: why do performances so generated frequently read as lifeless or “zombie-like”? While improvement in the density of collection and integration of data will undoubtedly refine the quality of the motion capture product, the anatomical answer to this artistic question goes beyond refinement of quasistatic simulation schemes and finite elements methods to address several basic principles of human facial anatomy, expression and ontogeny (development from juvenile to adult). The latter consideration is especially important in issues involving inter-individual scaling, because translation of facial performance onto characters of differing proportions presents one of the greatest challenges to the utility of motion capture.
Evolution and Anatomy of Facial Expression
As is true of most other primates, human social interaction is driven by dynamic and vanishingly subtle visual cues provided by facial expression. Darwin theorized that these expressions were largely innate, but subsequent researchers consider both innate and cultural factors . Therefore, the average human (or even chimpanzee) audience is the end result of selection from ancestors whose very evolutionary success was dependent upon successful decoding of facial signals. Unlike other body areas, the face will be subject to an intense unconscious scrutiny deeply embedded in our biology.
And unlike every other body areas, the movements in the face are not principally the movements of skeletal elements other around joints. Rather, most facial movements are the result of highly variable thin sheets of muscle which attach -- not to bone -- but from skin to skin.
FIGURE 1 Illustration of facial musculature from J.M. Bourgery and N.H. Jacob, Atlas of Human Anatomy and Surgery, volume 2 plate 93 (1854). Image in public domain. Note the (unintentionally) misleading depiction of the muscle as if they are attached to the underlying skull bones, when in fact the majority represent skin to skin connections.
This results in movement that does not proceed along a vector from a clear anchor point, but rather a surface deformation approximating various skin attachment points . Muscles in this category include not only the familiar linear muscles of facial expression, but also the circular “sphincter” muscles surrounding the mouth (orbicularic oris) and the eyes (orbicularis occuli). Contractions of the latter two are amongst the most important of readable expressions.
However, a very important linear muscle of facial expression not attached skin to skin is the buccinator.
FIGURE 2 Two primates illustrating the action of the buccinator muscle. Authors’ photograph
This muscle forms the tense area on both central cheeks when we (in the immortal words of Bart Simpson) both suck and blow. It is a developmentally essential muscle for mammals, as it is the muscle driving the suckling of milk as babies. It is correspondingly relatively larger in juveniles, and is anchored robustly to the corners of the lips (and therefore into skin and muscle, like other muscles of facial expression). However it originates from a thick band of connective tissue spanning the back of the upper to lower jaw on each side. Therefore its movement is markedly more unidirectional than most, with a posterior anchor point. In adult human facial expression, its asymmetric contraction during speech results in the lop-sided mouth posture of Burgess Meredith’s Penguin familiar to fans of the 1960’s ABC series Batman (and also characteristic of the most recent U.S. ex- vice-president).
Asymmetry of human faces and facial expression is the rule, not the exception, and therefore asymmetrical models and movements are crucial in creating believable and sympathetic human characters.
Jaw movement, of course, is the result of the more familiar skeletal motion around pivots points constrained by joint surfaces. These are the kinds of movements most easily captured and animated in the rest of the body. Regular speech -- as opposed to singing or shouting—results, however, in a surprisingly small amount of simple jaw adduction (or closing) around a transverse horizontal axis defined by two condylar joints immediately in front of the ear canals.
FIGURE 3 Illustration of mandibular (lower jaw) joint and surround muscles. From J.M. Bourgery and N.H. Jacob, Atlas of Human Anatomy and Surgery, volume 2 plate 97 (1854). Image in public domain.
Those familiar with the range of motion typified by the “Locust Valley Lockjaw ” or the speech habits of Gilligan’s Island’s Thurston Howell III can instantly see how significant jaw movement is not essential to intelligible human speech. The characteristic and readable movements of speech and the generation of phonemes are dependent on generally relatively minute changes in the configuration of the lips and by interactions of the tongue and lips with the oral cavity and teeth.
The jaws also have a particular quirk not found in any other human joint system, In chewing, the side of the jaw processing the bolus of food (the working side) slides forward on the temporo-mandibular joint (TMJ) surface, out of if socket onto the root of the cheekbone, while the opposite (balancing) side stays securely in its socket. Thus there are both anterior-posterior glide and hinge movements at the TMJ. This results in a net pivot of the jaw around a vertical axis, and an asymmetric movement along x, y and z axes describing an ellipse, as a result of the turning moment generated along the body and ramus of the lower jaw (mandible). It is also the cause of pain for those experience temporo-mandibular joint syndrome. As this movement is considerably more complex than simply opening and closing, animated scenes depicting characters eating – though admittedly rare – lack convincing jaw movement.
A Blink is not a Wink
Remarkable refinements in performance capture and translation of the movement of facial muscles have nevertheless been accomplished, through both sparse and complex motion capture marker data and markerless techniques. However, the eyes continue to remain the weak link and contribute disproportionally to the “zombie-like” appearance of many recent high-profile high-investment performance capture/animation efforts.
The first consideration is that of blinking. Blinking is the voluntary though largely unconscious closure of the eyes to clear and moisten the cornea. It is not accomplished by contraction of any muscle (contraction of the orbicularis occuli muscle would constitute a wink, not a blink). Rather, eyelid closure results from the brief interval of relaxation of the levator palpebrae superioris muscle, anchored in the fibrous skeleton of the tarsal plate of the upper eyelid along the eyelash margins. Not only is this movement extremely rapid, on average 300-400 milliseconds, but the interval between blinks ranges from approximately 2 – 10 seconds, averaging 10-12 blinks per minute, dependent on situation (lighting, moisture), emotional state, fatigue and age. Lying, deception and anxiety have been shown to generate greater blink frequency. Close concentrated work such as reading and computer usage results in far fewer blinks per minute, creating issues of eye dryness (blink now, please). Infants and children blink less frequently than adults, for unknown reasons.
The second -- perhaps more important -- consideration is that of the movement of the eyes themselves. In addition to coordinated directional movement of the eyes in gaze, even more vital for the “spark of life” is depiction of the brief flitting simultaneous movements of both eyes in the same direction, movements known as saccades. These movements represent the fastest muscle contractions in the human body and serve to improve focus and object resolution and to scan relevant aspects of the scene.
Sacaddes can occur in a horizontal, vertical or other linear orientation. Their velocity is related to their amplitude (arc), where a velocity of 300°/sec is achieved in a movement of 10° amplitude, and a remarkable 500°/sec is achieved in a movement of 30°amplitude. In saccades larger than 60 degrees, the velocity plateaus.
These movements are perceptibly related to distance of an object from the viewer. Therefore, a character staring into the distance will have less frequent, slower, lower amplitude saccades than one looking at an object or person close up. It is this failure to capture or animate saccades based on character interactions in 3-D space which gives most unmodified performances based on motion capture a vacant and dead stare. Even with cutting-edge facial performance -capture technology, recent feature length performances continue to have zombie-like performances where character interaction is unconvincing, due to inadequate blink rates and low amplitude low velocity saccades which resemble those of characters starting into the distance rather than conversing animatedly at arm’s length. In contrast and paradoxically, many purely animated characters read as more alive, due to proper attention to these vital movements. Although intentionally stylized, Pixar’s The Incredibles can nevertheless be singled out as a recent production where both focal distance and emotions were communicated convincingly via careful animation of eye movement.
In addition, reactions to social dominance, mental visualization and other aspects of acting can be communicated through these movements. Even with use of contact lens or optical systems to track eye movement, successful direct performance capture of eye movement has been elusive, yet it is vital to communicate only emotion, but also to give the sense of distance between characters.
The Child is NOT the Man
Finally, developmental stage and scale of character is critical. Several differences (blink rate, buccinators) between children and adults have already been mentioned. In addition, proportions of facial skeleton and amount of this facial skeleton affected by facial muscle movements are different in children. Facial expression differs between adults and children and indeed changes throughout ontogeny (individual development)
Smile on a two year old child. Note the relative dominance of the buccinator in mouth deformation, pulling the smile laterally. The more oblique angle of the cheekbones relative to the mouth and change in pull and activation of the zygomaticus major/minor muscle results in an open mouth shape and upper lip profile very different from that of the adult smile. Authors' photo and child.
The relative growth trajectory of the human skull and superimposed facial muscles and tissue is key to understanding these difficulties. A child’s skull has a proportionally larger braincase (neurocranium), relatively shorter facial skeleton, larger and relatively lower set eye sockets, and relatively more lightly built processes for muscular attachments. The proportion of the overlying facial muscle and therefore the portion of the face affected by muscle movement are also different from that of an adult.
Through growth, the facial skeleton grows a greater rate (positively allometric growth) than does the neurocranium as it approaches adult proportions. Cranial bone deposition (and therefore skull growth) terminates as an individual finishes puberty. As females generally reach puberty at an earlier age than do males, females tend to have skull proportions somewhat more similar to those of children than do males. As males continue to grow to a later age they develop relatively more heavily built skulls with heavier bony processes, smaller and proportionally higher eye sockets, more robust jaws and cheek bones, and more shallow forehead region. The consequence of these proportional differences is that both the distances and angles between homologous points in men, women, and children are all different.
A Biologically Driven Component of
Capturing the Complexity of Facial Expression (or Not)
Key to this discussion is the biological concept of homology. Evolutionary speaking, homologous structures are structures that may be traced to a common structure in a common ancestor -- like the upper arm bone (humerus) of a human is homologous to the upper wing bone of the human’s caged budgie. Each may be traced evolutionarily to a more primitive basal animal hundreds of millions of years ago. But functionally-speaking, homologous points between individuals are points which are precisely the same anatomically. Homologous points are therefore not necessarily metrically determined. They are not arbitrary proportional distance measures, but rather identical anatomical structures.
When motion capture data are gathered from an original subject, proportional differences are not problematic when transferred to a character that is isometrically proportional. The differences – and therefore problems – can be profound when attempting to impose motion capture data from an individual of size and age difference on another, regardless of whether data are applied to homologous positions in the destination subject. In films where the destination character of lesser or greater size is intended to be grotesque and/or non-human (such as Gollum or Grendel), the effect can nevertheless be highly satisfactory. However, for ingénues or children intended to be sympathetic and realistic, proportional scaling from adult performance remains an issue.
It is clear that despite the enormous utility and future potential of facial and eye performance capture, distinct limitations currently exist. Asymmetry, isometry and ontogeny merit greater attention. A blink is not a wink. Eye movement in particular remains problematic – current performance capture has not satisfactorily dealt with this most critical issue.
Is the situation hopeless? No. Is it limited? Yes. Can the gap be narrowed? Certainly. Thorough study of eye movements, including saccades, already takes place in the biological and psychological arms of academia and in the military. Detailed anatomical training and immersion for animators would help them to manipulate motion capture data provided them or provide a viable and affordable alternative in the animation of facial expressions. A straightforward solution is to start and then build continually a larger and larger database of motion capture data for individuals of both sexes and as broad a range of ages as possible. This latter strategy has important implications not only for data applied to the entertainment industry, but those made available to the biomedical community. In addressing the difficulties of motion capture in general and facial motion capture in particular, the pediatric and adult benefit of data collection driven by the entertainment community could have profound positive effects.
Although there are later editions, Gray’s Anatomy edited by Peter Williams, 38th edition (1995, reprinted 1998, Churchill Livingstone) remains the most detailed anatomical account and our personal favorite. Most facial expression research is of surprising limited applicability to facial performance capture and animation, as psychologists are largely concerned with reliability and utility of coding systems. A useful though overtly psychological start for online resources on facial expression is http://www.face-and-emotion.com/dataface/misctext/iwafgr.html. The Artist's Complete Guide to Facial Expression by Gary Faigin (Watson-Guptil Productions 1990) remains useful.
For detail on psychological methods of quantification, see Hager, Joseph. C. A comparison of units for visually measuring facial action. Behavior research methods, instruments and computers, 1985, 17: 450-468. A highly technical but very relevant paper is “Automatic determination of facial muscle activations” by Eftychios Sifakis, Igor Negerov and Ronald Fedikiw (2005) in Volume 24 , Issue 3 Proceedings of ACM SIGGRAPH 2005:417-425.
Reminger, Sheryl L.; Kaszniak, Alfred W.; Dalby, Patricia R.. "Age-Invariance in the Asymmetry of Stimulus-Evoked Emotional Facial Muscle Activity" Aging, Neuropsychology, and Cognition Volume 7, Issue 3 September 2000: 156 - 168.
Facial Expressions Babies to Teens: A Visual Reference for Artists by Mark Simon, Watson-Guptil Productions 2008 is an inexpensive and useful reference, although photographic quality is a bit disappointing. For an academic approach, see also Charlesworth, W. R., & Kreutzer, M. A. Facial expression of infants and children. In P. Ekman (Ed.), Darwin and facial expression: A century of research in review. New York: Academic, 1973. Chapters 5 & 7 of Measuring Emotions in Infants and Children Vol. 1 ed. by Carroll Ellis Izard (Cambridge University Press, 1982) are worth consideration but unfortunately of limited utility to the artist.
About the authors:
Dr. Elizabeth Rega
received her Ph.D. from the University of Chicago and is an associate professor of anatomy at Western University of Health Sciences in Pomona, California. As a research scientist, she has published numerous scientific articles and has conducted fieldwork on three different continents. Her specialization in human and nonhuman primate anatomy has led her to be a frequent consultant to the animation community.
With Walt Disney Feature Animation, she helped to develop the facial construction of major characters in Pocahantas, and facial anatomy, body anatomy, and locomotion of humans in Mulan and Brother Bear. She was the lead anatomist consulting on apes and the title character in Tarzan. She helped to develop the structure of the titans in Hercules, and was the chief anatomical consultant for the Disney short film John Henry. Her work has not been restricted to tradition animation, as she aided in the development of virtually all of the human anatomical material with SONY Pictures Imageworks for the film Hollowman from modeling to animation, and has been a frequent consultant to Walt Disney Imagineering on the morphology and history of ethnic diversity in film, animation, and entertainment.
Dr. Stuart Sumida
is a Professor of Biology at California State University San Bernardino. He received his Ph.D. from UCLA in 1987, having previously taught in the Department of Anatomy and the Pritzker School of Medicine at the University of Chicago. He is a comparative anatomist and paleontologist who specializes in the biomechanics and evolution of locomotion. He is the author of three books and numerous scientific papers. His palaeontological research has taken him throughout North America and Europe with the support of many museums, the National Geographic Society and NATO. His teaching focus is primarily on comparative vertebrate anatomy and human anatomy. He has been an anatomical consultant to special effect artists and animators on over thirty feature length films for such groups as Walt Disney Feature Animation, DreamWorks and Pacific Data Images, Warner Brothers, Rhythm and Hues, SONY Pictures ImageWorks, Pixar, Walt Disney Imagineering, and others.
His work in the animation and special effects industries began with Beauty and the Beast and Lion King, and has included Mulan, George of the Jungle, the live action versions of 101 and 102 Dalmatians, Tarzan, Dinosaur, Cats and Dogs, Harry Potter, Lilo and Stitch, Spirit: Stallion of the Cimarron, Scooby Doo I and II, Stuart Little I and II, Reign of Fire, Brother Bear, Shrek II. .Chronicles of Riddick, Madagascar I and II. Chronicles of Narnia, Surf’s Up, Ratatouille. Kung Fu Panda, and most recently Bolt for Walt Disney Feature Animation.