TERI RESEARCH INC  Providing the Information Advantage...                home.gif (2807 bytes)     


SERVICES       DEFENSE APPLICATIONS      KIOSK-COMMERCIAL       AMERICAN DISABILITIES ACT      back.gif (114 bytes)

3-D Acoustic Virtual Reality

Regeneration of audio cues to the listener can contain a mix of digital audio .WAV files captured from environment cues, digitized communications, computers, telephones, web sites, games, application programs, speech synthesizers, etc. The 3-D audio effects engine then uses the spatial interpreter and situation database information to modify the captured sounds with the Head Related Transfer Function (HRTF) algorithms peculiar to the identified position of the source. Communications can then be directed to a previously determined position in 3-D virtual space or dynamic representative location to create spatial directed acoustic information presentation to the listener.

When confronted with provisions of presenting a more realistic listening experience through 3-D audio techniques, two fundamentally different technical approaches are candidates; localization and enhancement. Localization may be defined as the conscious placement of individual sound sources in specific apparent locations with respect to the listener. The conventional stereo technique of panning qualifies as an example. The use of effective localization techniques is dependent upon the availability and access to individual sounds.

Multimedia interfaces, video gaming and interactive music applications, apply the real-time mixing of multiple individual sources as a matter of course. The encoding techniques and localization algorithms are applied interactively formulating the end users 3-D audio listening environment. The use of localization is part of the purposeful design of a virtual acoustic environmental, such as tying multiple auditory events together in a logical fashion.

 The goal of enhancement is to provide a more realistic, more immersive and thus more natural listening experience from existing pre-mixed source material. Enhancement is a useful technique augmenting the listener's ability to distinguish what type of sound he hears and more so because it can be applied to any source material, irrespective of how or where it was produced. New and innovative methods are modeled to improve front-to-back and elevation spatial "virtual" hearing reality. Delay line methods are used for processing efficient digital acoustic effects to more faithfully restore these 3-D realistic sound regeneration effects. Developed algorithms specifically address reverberation, propagation delay and Doppler shift phenomena to better model the HRTF convolutional wave file adjustment. Re-establishing multiple sound regeneration also uses advanced digital sampling synchronization to preclude the effects produced by lost or missing data samples.

Utilizing both localization and enhancement audio techniques, in conjunction with the HRTFs, enables us to produce an accurate 3-D audio depiction of both the situational cues as well as pre-mixed source material. Furthermore, single-ended 3-D audio signal encoding which is, in effect, decoded by the listeners auditory system rather than hardware dependent encoding/decoding schemes further enhances the listeners ability to track and monitor audio cues. Single-ended encoding will also reduce processor time and hardware dependencies in the listeners audio ensemble which lends itself toward a more lightweight and low power solution.

To create a 3-D audio presentation interactive demonstration, TRI has been able to preprocess and enhance captured audio information while simultaneously being able to "place" or localize sounds in the 3-D virtual space. Since the basic 3-D audio presentation effect requires only a standard delivery channel, the preprocessing can be added to any file format such as .WAV or digitized analog radio communications. The resulting files then become hardware independent requiring no special handling. The resultant output becomes part of the signal itself, as would reverberation, equalization, localization or other audio effects.

3-D audio modeling attempts to create a vivid acoustic environment. This requires a variety of digital-audio effects, including time-varying filtering, reverberation, discrete echoes and Doppler shift for moving sound sources. These algorithms are in turn built of lower-level component operations such as time-delay and multiply-accumulate operations.

The list of acoustic effects within a virtual environment is lengthy. The position and motion of objects must be modeled to achieve realistic 3-D imaging. The motion of an object can result in a Doppler shift, causing the characteristic pitch increase and sudden decrease heard, for example, when a car drives by. Objects between the source and the listener can obstruct sound waves, thereby attenuating high-frequency components. Reverberation results from reflections off others such as walls. The distance of an object also affects the relative amplitudes of direct and reflected sound. Even the presence of moisture in the air changes the transmission characteristics. Clearly, accurately modeling the physics of an acoustic environment is computationally intensive to an absurd degree. In practice, only the perceptually important features, identified through simulation, need to be modeled.

Natural reverberation occurs when sound waves undergo an immense number of reflections. As the distance the wave has traveled increases, the sound is attenuated and at each reflective surface, some energy is absorbed. The multiple reflections produce a series of echoes with successively decaying amplitudes. The delay between echoes is equal to the distance between reflective surfaces divided by the speed of sound through the air. There are infinitely many simultaneous reflective paths from the sound source to the listener. Statistically modeling the density of the reflections produces an effective approximation requiring only a modest number of digital-audio delay lines. When combined with all-pass filtering techniques, this provides an effective algorithm for creating compelling reverberant audio spaces.

A digital-audio delay line is composed of a circular buffer of memory into which digital audio is written at one address and read at a different address. This algorithm supports generation of vivid 3-D effects and deconfliction of simultaneous sound sources.

Repeating echoes are modeled by mixing a portion of the delayed signal with the input signal before writing to the delay line. This feedback produces a series of delaying echoes simulating multiple reflections of the same sound source. With the addition of a feed-forward path of equal magnitude and opposite polarity, the delay line is transformed into a high-order all-pass filter, a useful building block for artificial reverberation.

Delay-line maintenance can be a bottleneck in digital audio processing. Maintaining circular addresses is a simple task, yet consumes precious cycles during which the effects processor could be performing complex math. In addition, the long Random Access Memory (RAM) access times associated with a relatively large delay memory can stall the effects processor. Dedicated hardware that performs delay-line addressing and data movement frees the effects processor to perform the high-speed multiply-accumulate operations required by signal-processing algorithms. To support the requirements of 3-D audio and environmental audio processing, the optimal processor supports 32-bit arithmetic. A designation of general-purpose registers facilitates parallel processing of many audio streams.

In an interactive environment, the various sounds each source emits are located within the main memory of the computer. While the effects processor is ultimately responsible for mixing the large number of sound sources into the output channels, each sound stream must initially be kept separate so the spatial and environmental processing of each can be independently controlled. This means that the 3-D audio subsystem requires a method of efficiently transferring a designated number of digital-audio streams from the main memory into the effects processor to perform the 3-D rendering. Paged virtual memory introduces the additional need for virtual-to-physical address translation.

To mix multiple streams of digital audio, all must be resolved to exactly the same sample rate via the process of sample-rate conversion. Clearly, conversion of the sample rate is required when input audio digital streams have different sampling rates. Surprisingly, sample-rate conversion is required even when mixing two sounds at the same apparent rate, because each of the inputs has its own crystal oscillator. Manufacturing tolerances and drift guarantee that the two work frequencies will be slightly different. This slight difference means the relative phase of the two sample streams will drift over time and eventually, a sample will either be dropped or repeated, possibly resulting in an audible defect. So every digital audio stream would first have to be converted to a common sampling rate before it could be mixed with the other streams.

A solution is to perform sample-rate conversion within the digital mixer, allowing it to accept asynchronous digital audio streams. The conversion must be controlled by a sample-rate tracker that continuously estimates the incoming sample rate. That saves the alternative complexity of providing a Phase Lock Loop (PLL) and supports multiple asynchronous digital audio streams with little incremental cost.

The typical way to accommodate the large number of recording sample rates required by various audio applications is to change the clock frequency of the A/D. However, this precludes using an inexpensive codec because the system may simultaneously require a different playback sample rate. It can also increase system cost because the analog filters must be capable of rejecting aliases at all supported sample rates.

A less expensive and more flexible way to solve the problem is to operate the digital audio processor at a fixed rate and generate other sample rates by sampling-rate. However, careful design of the digital anti-aliasing filters ensures that there are no audible defects.

A demonstration unit with current 3-D modeled HRTF convoluted audio is currently running on a desktop PC running Windows 9x and NT. The 3-D software modules allow the user to immerse himself in a virtual 3-D environment. Some objects within the environment can be placed in fixed positions, others are free to move around the virtual space of the user. The listener has the option of using headphones or desktop speakers. There is a difference between the HRTF algorithms for headphones versus speakers. Desktop speakers have to account for cross-talk while headphones do not because each ear receives a separate channel of input, either left or right. The listener is free to walk around the virtual environment noting how the audio presentation of the objects in 3-D is effected by his movement. The listener has the ability to walk in any direction, spin himself around a full 360° , stop and look up 90° or look down 40° in elevation. Audio files for the TRI demonstration depict helicopters flying, radio communication, audio alerts and warnings as part of the virtual environment for situational awareness presentation to the listener.