An Interactive, Multi-Modal Workspace for Physically Based Sound
Author: Benjamin Schroeder - Ohio State University
Physically based sound synthesis holds great promise for creating instruments that produce realistic sound but which can go beyond what is possible in the physical world.
All sound in the real world is produced by the periodic vibration of physical objects. Physically based synthesis techniques model how real objects vibrate, then simulate the vibrations in order to produce real sound. Interactions between objects may be simulated as well, and this is where much of the power of the technique lies.
The same guitar or piano can sound wildly different depending on how it is being played. It is difficult to reproduce variations in playing technique using recorded samples, either by recording large libraries of samples or by changing a single sample to reflect a different style. However, by modeling the way a player applies force to (say) a guitar's strings or a piano's keys, physically based techniques can reproduce the subtle realism and connectedness of actual performance.
Similarly, physically based synthesis allows for instruments to be modeled in a modular fashion which matches well with real-world intuition. For example, a simple guitar model might be made up of several strings, each attached to a sounding plate via a springy bridge, along with a model of a player's fingers and their action.
In our research, we are investigating how to use physical models to produce instruments that have realistic sounds, but which are native to the computer and have the strengths of the virtual. There are many existing computer music languages, but we believe that truly expressive physical modeling work requires a new approach. Computer music languages have proved very successful in working with sound as sound, using a vocabulary of samples, waveforms, frequencies, and filters. However, the vocabulary of physical modeling is different, having to do with sound but also with physical quantities like positions, velocities, and forces, and with descriptions in terms of things like shape and material.
One early result of our research is an experimental interface for interacting with models such as
sounding strings and plates. Our interface is multi-modal: it seems good to interact with models
in different ways at different times and for different purposes, and important to have both audio and visual
feedback. Our interface supports four major kinds of interaction:|
- direct manipulation, using concrete touch and visuals;
- the use of tangible controllers such as sliders or microphones;
- interaction in a shared space with procedurally animated elements;
- and the use of textual code for symbolic, algorithmic interaction.
Users of the system can go back and forth between the levels or mix and match them as needed.
Figure 1: Plates and strings in the direct-manipulation environment. (video)
Figure 1 shows several strings and sounding plates in our prototype system. Users may interact with these elements using straightforward gestural controls: to pluck a string, drag the mouse across it. To tap a plate, simply click on it. The strings and plates react immediately with both sound and animation. Both sound and animation are results of an underlying simulation, and vary realistically depending on where one interacts with the element or (in the case of the strings) how strong the “pluck” is.
Pulling on a string's endpoints changes its length. Just as in the real world, longer strings have a
lower pitch (all other things being equal) and shorter strings a higher one. In our virtual system, strings
may have their length and pitch changed interactively and at any time, even if they are already sounding.
Figure 2: The system in operation on a multitouch table.
Figure 3: Slowing the speed of the simulation shows vibrations in more detail. (video)
Our system works well with traditional mouse-and-keyboard interaction, but we have also designed it with multitouch displays in mind. Figure 2 shows the system in operation on a diffuse-illumination multitouch table. Having multiple touches available expands the interaction possibilities. For example, a user might use one hand to fret a string while playing it with the other, or might rotate a string's endpoint to change its tension, as if turning a virtual guitar tuning key. The table setting also creates possibilities for multiple players to work with the same elements at once.
As mentioned above, the animated feedback from the system varies realistically according to the vibrations
from the underlying simulation. This adds visual interest, but also builds intuition about how sound and vibration
are related. Users can investigate this further by interactively slowing time, making more of the detailed
vibration visible (Figure 3).
We use finite-difference time domain (FDTD) models in our simulations. These are based on direct simulation
of differential equations such as the simplified string model in Figure 4. For strings and plates, these models
provide good generality in terms of material representation as well as force-based interaction between multiple
objects. We use parallel computation via the CPU vector processing unit to simulate our models in real time.
Figure 4: A simplified string model. Informally, this states that a string accelerates in proportion to its curvature, with some damping.
Although we use FDTD models, other physically based models like modal models or finite-element models might work
as well. These models are in some ways “better” and in some ways “worse” for our uses than FDTD models;
each involves tradeoffs related to things like the ability to represent different kinds of geometry, range of
timbral reproduction, and the computational complexity of performing simulations. The most important thing seems to
be the ability for objects to interact with one another in terms of positions and forces over time, and (for the purposes
of animation) the ability to easily generate a visual representation of vibration.
Tangible controllers and sensors such as MIDI instruments, game controllers, cameras, and microphones give users new
ways to map intent to sonic changes. For example, a MIDI slider bank might be used to vary the tension of a set of strings
or acceleration information from a Wii remote might change the damping of a plate. A camera system might track a person
as he or she moved around a room, striking a plate as the person moved. In this way, a person could have the feel of
walking around their instrument.
Figure 5: A blown-string instrument. (video)
Plucked-string and bowed-string instruments are both common in the real world, but by using a microphone to provide
continuous force to a set of strings, we can produce a novel “blown string” instrument (Figure 5). Blowing
into the microphone excites the strings and produces a blown-string sound. Since the strings resonate with
the microphone input, this setup can also be used with speech as a sort of vocoder.
Procedurally Animated Elements
As we have seen, our models are situated in space and so respond well to direct gestural manipulation. Procedural
animation elements provide another way to make use of the spatial arrangement.
Figure 6: A rigid-body physics machine. (video)
Figure 7: Robots changing the lengths of strings to produce a glissando effect. (video)
One simple kind of procedural element is a rigid-body physics machine such as the one shown in Figure 6. Here, balls are shot from a cannon at regular intervals. They bounce off a rotating slab, alternately hitting a metal plate, a wooden plate, or a string. This is a simple machine, but by combining it with a slider which varies the rate at which the cannon shoots, it can be used to explore ideas about relationships between rhythm, space, and time. Cannons such as the one seen in this example can also be used by themselves as metronomes which pluck a string or tap a plate at a constant rate, freeing the user to work with other elements or other parts of an element: for example, one might vary a string's length while a cannon plucks it.
Animated elements may also change a model's properties directly. Figure 7 shows several “robots”, each of
which makes smooth changes in the length of a string at certain intervals. The strings may then be played using any
of the interaction mechanisms that we have discussed, with the robots' movement producing an unpredictable glissando
Finally, a fourth way to interact with the elements is via textual code. Code is less direct than the interaction methods
we have discussed so far but provides more abstractive power, letting users (for example) operate on many strings at once,
repeat precise operations, use parametric algorithms, or simply discover and change numeric properties. Our language is
object-oriented and is similar to Smalltalk or
Figure 8: Conversational interaction through code. (video)
Figure 9: Rhythmically plucking strings at different intervals. (video)
Figure 10: Creating an ad-hoc “whammy bar” effect. (video)
Figure 8 shows a simple conversational interaction. In this figure, the user has created a string and changed its frequency, first to a particular value, then using a random number generator. The user has then printed the value of the frequency which was assigned and plucked the string programmatically. During a sequence such as this, the string remains “live” and so could be worked with in other ways, such as via direct manipulation.
It is often useful to run some code ad infinitum while doing some other task. In Figure 9, “run repeatedly” boxes are used to pluck strings at varying intervals. (Time is advanced in our scripting language only when the user explicitly requests it, as with the “strong timing” semantics of the ChucK music programming language.)
The scripting language can also be used to add new methods to objects. In Figure 10, the user has written code
to make several strings match length with another, producing a sort of “whammy bar” effect. This
is not necessarily the best way to implement a whammy bar; for example, one might use direct manipulation to attach
all of the strings to a single moving block. However, having textual code available at the interactive level lets users
experiment with new behaviors such as this without having to build them into the system.
We have discussed several ways to interact with physically based sound models. Our system is in its early stages, but the concrete approach and combination of different kinds of interaction seem promising.
Our goal has been to use physical models to create new kinds of electronic musical instruments, and we continue our
investigations towards this end. We are working to create a more general textual language, able to easily express physically
relevant ideas like space, timing, and modular connections. We are also investigating even more integrated approaches,
combining graphics and code more closely; and we continue to work with new sounding models and tangible
The author wishes to thank his collaborators, Prof. Richard Parent of the OSU Department of Computer Science and Engineering
and Prof. Marc Ainger of the OSU School of Music, as well as Prof. Alan Price, of the OSU Department of Design,
who designed and built the multitouch table used in this work. This project also makes use of several open-source
libraries: the Box2D physics engine, RtMidi,
About the author
Benjamin Schroeder is
a researcher and artist living in Columbus, Ohio. He is a Ph.D. candidate in computer science at Ohio State University. Benjamin's interests span several different time-based media including animation, sound, and physical interaction. His work investigates the power, promise, and beauty of computational arts media, asking questions about how computation and interaction extend our creative reach. Benjamin's dissertation research investigates new programming models for physically based sound. He has presented this work at such venues as SIGGRAPH, SMC, and the ICMC Unconference.
Benjamin can also be found on the web here.