However, the response at the sweet spot becomes invalid if the listener changes position. The challenge then becomes characterising a listening zone around the room and updating the responses with tracked listener movement. A large number of acoustic measurements on the listening zone boundary is impractical.
SADIE looks at methods for sparse measurement around the listening zone to effectively characterise the soundfield. These are:. Motion tracked binaural sound over headphones Design of immersive surround systems requires an understanding of the perceptual cues for sound source localisation. It is the purpose of the present invention to provide a binaural reproduction system which a is more tolerant of listener position, and b provides a better headphone image, than the prior art.
This has been achieved following our observation that it is not essential to cancel all of the transaural crosstalk in order to achieve the requisite three-dimensional effects via loudspeaker auditioning.
If the crosstalk is only cancelled partially, then there are two benefits: firstly, the slight position-dependent artifacts of the prior art schemes are eliminated, and secondly, because there is less cancellation taking place, then the headphone image is more representative of a full binaural recording, with consequent enhanced image depth.
The partial cancellation scheme is in the preferred embodiment applied over the whole bandwidth, and the degree of partial cancellation has an optimal range. This will be described below by reference to Figure 3, wherein parts similar to Figure 1 are denoted by the same reference numerals.
In Figure 3, crossfeed filters 30L, R have functions xC, i. Delay elements not shown may be inserted in the crossfeed channel paths between junctions 6 and summing junctions 8 in order that the phase relationships between the signals in the main channels and the crossfeed channels are preserved such that when the sound is reproduced the cancellation signal arrives simultaneously with the primary signal.
In digital implementations, however, it is possible to incorporate the time delays into the filter blocks themselves, in which case extrinsic delay elements become superfluous. In addition, a single filter 34L, R is introduced into the main channel path, which encompasses the functions of filters 10, 12 of Figure 1 but has a new filter function, G.
It is possible to define G explicitly in terms of x, A and S, in order to provide the requisite goal of precise, partial crosstalk cancellation, whilst dealing with the multiple cancellation problem correctly, as before, such that there is unity gain between the right source and the right ear i. The overall transmission function from the right input R , to the right ear r , R r f is:.
A consequence of this is that G becomes a function of A, S and x. This creates an intrinsically stable system: another important advantage of the invention. Prior art systems, in which full crosstalk cancellation is implemented at low frequencies, are impractical because the A and S functions converge at low frequencies.
The overall transmission function from the right input R , to the left ear I ,R l f is:. Consequently, even a modest crossfeed cancellation factor x of 0. It will be noted from equation 6 that even where S and A are approximately equal, there is nevertheless a stable crosstalk factor, since S 2 - xA 2 will be finite. Referring now to Figure 4 this shows schematic views of the filters 30 and 34 of Figure 3.
Crossfeed filter 30 shown in Figure 4a comprises an input signal path 40 with a series of one sample time delays Z -1 42, with tapping paths 44 coupled between nodes between the delay elements and a summing junction Each tapping path has a multiplier 48 where an appropriate scaling factor C n is applied to the signal in the path.
The output of the summing junction 46 has an attenuation element 50 therein of value x. It will be seen that such filter is a finite impulse response filter. The attenuation factor introduced by the element 50 may be introduced into the input path 40, or alternatively, it may be introduced by modification of the scaling factors C n.
Referring to Figure 4b showing a schematic view of filter 34, similar parts to those of Figure 4a are represented by the same reference numeral. In this filter, scaling factors D n are applied to multipliers 48, and these scaling factors are derived from equation 4.
Thus the effect of the various scaling factors D n is to produce the filter function shown in equation 4. Referring now to Figures 6 and 7, these show two graphs which illustrate the effect of partial crosstalk cancellation on a listener. The effect is necessarily to an extent subjective, but the graphs have been derived experimentally from the listening experiences of experts in the art.
The experts in the art listen to speakers arranged as indicated in. The experimental results actually recorded are indicated as filled rectangles in the graphs. Clearly, such figures are to an extent subjective, but the graph shown represents an averaged mean for a set of experts in the art. Figure 7 is a similar graph wherein abscissa represents the "sweet-spot" size, namely the region in which the listener may position his head and experience the optimum binaural effect.
This sweet-spot size actually exists in three dimensions so that a listener may move his head backwards, forwards or vertically, rotate the head, as long as he remains within the sweet-spot region either side of the mean position. It has been found by averaging the results for a set of experts in the art, that with cancellation of 0. Thus combining the results of the observations represented by Figure 6 and 7 it may be seen that good results are obtained with 0.
It is important to note that in certain circumstances, it is desirable to separate out the two principal elements of the transaural crosstalk cancellation schemes which have been described. These two elements are a the crosstalk cancellation itself, and b spectral equalization, to compensate for the "twice-through-the-ears" effect.
The schemes deal with both of these factors simultaneously by virtue of the incorporation of the second air-to-ear transmission factor, S, as part of the whole system, such that the equations for providing ideal transfer characteristics form the source to the listener, when solved, generate filter networks which implement crosstalk cancellation and spectral equalization simultaneously.
It is sometimes convenient, however, to provide a system which only provides crosstalk cancellation, without spectral equalization. This alternative embodiment of the present invention i. The overall transmission function from the right input R to the right ear r , R r f :. It will be noted that this is the product of previous equation 4 and S, and that the implementation involves simply substituting the solution of 7 for G in Figure 3.
Apparatus according to claim 5 wherein the cross channel filter 30L, 30R has a signal path therein with an attenuation scaling factor of x. Apparatus according to any preceding claim wherein the summing junction 8L, 8R is operative to add signals present at the inputs thereof or is operative to subtract signals present at the inputs thereof.
USB1 en. EPB1 en. JPB2 en. DET2 en. GBD0 en. WOA1 en. USB2 en. Apparatus and method for controlling a plurality of loudspeakers by means of a graphical user interface. Apparatus and method for driving a plurality of loudspeakers by means of a DSP. Treating hair, to form shampoo-resistant sheaths in situ on the hair, by applying silicone-based components reacting by hydrosilylation.
Vertical lines indicate the frequency of the tones used for the evaluation. The vertical right axis indicates the interaural azimuth of each filter. There are many tools available for binaural spatialisation, both commercial and open-source. Some are provided by major actors in the virtual reality industry, such as Google or Oculus owned by Facebook. There are as many implementation approaches, feature sets and license schemes as there are tools.
In this section, we focus our comparison of the characteristics of the 3DTI Toolkit with those of the most representative open-source tools. In addition, we also compare the 3DTI Toolkit with the most popular commercial, closed-source tools. Nevertheless, in some cases it has not been possible to gather detailed information about the various algorithms, as they are not reported in the available documentation.
Tables 2 and 3 give a summarised overview of the various features and characteristics of binaural spatialisation tools, including the 3DTI Toolkit. In some cases, we were able to infer this information from comments in the API code. When this was not possible, we used the abbreviation NR not reported for the specific feature. Thus Table 3 is not as reliable as Table 2. There are several approaches for rendering the direct and reflected paths when doing binaural spatialisation.
The architecture of the 3DTI Toolkit manages the direct and reverberation components as independent modules, which can be connected or not at some point in the processing chain. An example is the near-field ILDs modification, which is applied only to the direct path signal, and not to the reverberation. This specific feature could only be found in Slab3d.
Reverberation simulation in the 3DTI Toolkit consists in the convolution of an Ambisonic sound field with binaural room impulse responses BRIR , which makes computational cost independent of the number of sources. This solution is combined with the implementation of UPOLS convolution, allowing to spatialise several sources with large BRIRs at the cost of reducing spatial resolution, which is assumed to be less relevant for reverberation processing as opposed to direct path spatialisation.
Most of the other tools implement synthetic reverberation using parametric shoe-box models, allowing the user to configure the dimensions and materials of a rectangular room, usually processing separately early reflections and late reverberation tail Oculus spatialiser, 3D Sound Labs, Slab3d, Dear VR. The solution adopted by Real Space 3D Audio is based also on the shoebox model, but allowing to build more complex room geometries by dividing the geometry into multiple shoe-boxes.
Some tools go even further, allowing configuration of arbitrary room geometry through physical models of sound propagation based on scene geometry Steam Audio, Resonance Audio. Some tools provide means for simulating occlusions and reflections on obstacles Resonance Audio, Real Space 3D Audio, Steam Audio , while others delegate this to the application level. The 3DTI Toolkit does not provide specific tools for occlusion simulation, but its modular architecture with full separation between direct path and reverberation allows for a straightforward implementation at application level, as in [ ].
Given the high computational cost of HRTF convolution, many tools provide some options for increasing performance at the cost of lowering spatialisation quality. The 3DTI Toolkit provides two spatialisation modes.
The high performance mode uses a Spherical Head Model [ 81 ] to adjust a 4 th order IIR filter which is designed for different interaural azimuth and distances, and stored in a look-up table which can be loaded as a file. This allows to account for near-field effects, as the spherical head model can be computed also for small distances.
In addition, the model can be changed by the user by loading another table, a feature not found in other tools. Other tools provide some options for increasing performance at the cost of much lower spatialisation quality. Typical solutions include: implementing distance culling, as far sources are not rendered OpenAL Soft, Real Space 3D Audio, Dear VR ; projecting all sources into an Ambisonic sound field with configurable order Resonance Audio, 3D Sound Labs, Oculus spatialiser ; and using simple stereo panning as a low quality alternative for secondary sound sources Rapture 3D.
It is also common to save resources in the reverberation process, by reducing the number of early reflections Real Space 3D Audio, 3D Sound Labs , or precomputing room impulse responses for different points in the scene Steam Audio. A unique feature of the 3DTI Toolkit is the customization of the listener head directionality pattern from omnidirectional to different cardioid shapes , feature designed mainly for its integration with the simulation of hearing aids.
Regarding the modelling of audio sources, the 3DTI Toolkit models sources as points from which sound emanates omnidirectionally. Most of the other existing available tools simulate distance through level attenuation due to sound propagation through air, which follows the inverse square law attenuation of 6 dB with every doubling of the distance. Although this is often the default setting due to its physical correctness, some tools provide customization of the distance attenuation curve Real Space 3D Audio, OpenAL Soft.
The 3DTI Toolkit implements the inverse square law, but allows for customization of the attenuation slope in dB every double distance , which is a solution found also in a few other tools Rapture 3D, SoundScape Renderer. Furthermore, the 3DTI Toolkit implements a completely independent management of the attenuation for the direct and the reverberation path, allowing distance-dependent changes in the direct-to-reflected signal ratio.
The effect of air absorption at high frequencies for large distances e. Regarding simulation of near-field sources, Oculus spatialiser and Resonance Audio model the effect of acoustic diffraction around the head, and SoundScape Renderer provides a resource-expensive experimental solution using High Order Ambisonic HOA [ ].
Other tools provide an SDK with data structures that can be filled by coding from scratch a file reader Steam Audio. Microsoft HRTF uses a fixed standard HRTF averaged from anthropometric measures for elevation cues, but allows parameterisation of the listener for azimuth spatialisation. Soundscape Renderer imports multichannel WAV files, with two channels for each direction, but building this set of WAV files is a long and complex process.
Using custom file formats implies that the users cannot or hardly can do their own translations of HRTFs, and need to rely the designers of each tool for this task.
These filters are based on a rigid spherical head model, and are applied after convolution with the interpolated HRIR.
Moreover, a cross ear parallax correction is applied, which is especially relevant for near-field sources. Some other tools, like SPAT or Anaglyph, use this technique but, to the best of our knowledge, no other open-source tool allows this realistic near-field HRTF correction. All file read and write operations are provided in an optional separate package see Additional tools.
A 3DTI Toolkit release as Unity3D package has also been created, but is currently being evaluated, and not yet available to the general public. If interested in this release, please contact the authors. Finally, a set of demonstrator test applications, presented in section Additional tools, have been released which allow users to access the 3DTI Toolkit features through a simple but comprehensive user interface.
This resulted in several 3D audio rendering tools to be released, with various characteristics and integrating different features. However, not all of them are available as open-source. As Ince and colleagues argue in a recent editorial in Nature [ ], the rise of computational science has added a new layer of inaccessibility. This should be overcome by the release of computer programs as open-source, allowing clarity and reproducibility.
And this is exactly one of our main aims and one of the reasons we decided to create and release the 3DTI Toolkit. Among the tools currently available as open-source, the 3DTI Toolkit is the one allowing most configurability, making it a very appropriate instrument for 3D audio research. As an example, the demonstrator test application is currently available for Windows, Mac and Linux, and the Toolkit has also been tested for Android and iOS.
As it has been outlined in the paper, a special effort has been put in removing artefacts related to dynamic scenes, where sources and listener are free to move. This is a particularly important condition for interactive VR applications where the sound designer cannot easily predict scene changes in advance.
Furthermore, considering that real-time convolution with HRTFs and BRIRs can become an issue in terms of computational costs, the use of Uniformly Partitioned Convolution is a very unique feature that, to our knowledge, has not yet been implemented in other available open-source tools.
Another feature of the Toolkit, again very relevant for VR applications, is the simulation of near-field effects. The user is completely free to move and approach sound sources in the virtual environment, which is rendered simulating both directional and distance cues to a level of accuracy which cannot be found in other available tools.
In this context, it is important to underline that the 3DTI Toolkit also includes simulators of hearing loss and hearing aid [ 64 ]. These features are out of the scope of this paper, and will be detailed in further publications. The 3DTI Toolkit is an alive project, which is being continuously improved and assessed.
This is obviously facilitated by its open-source nature, which allows for external contributions and bug reporting. Our plans for future developments include, for example, improving the customization by computing ILD compensation for near-field effects on-line, instead of relying in a pre-computed filter.
We are also planning to add multi-listener support, which would allow the use of binaural sound in collaborative virtual environments. Browse Subject Areas? Click through the PLOS taxonomy to find articles in your field. Introduction Binaural literally means relating or involving both ears.
Technical background The human auditory sense is able to localise sound sources in the surrounding environment thanks to several localisation cues embedded in the sound arriving at the two ears.
Binaural spatialisation The theories at the basis of the binaural spatialisation technique are not particularly recent [ 5 ]. Nevertheless, the main motivation for the development of a custom binaural spatialisation library was the need for several features which, all together, were not found to be available in other existing tools: Support for multiple platforms, including web.
Full 3D placement and movement of sources and listener, including near- and far-field simulation. Smooth behaviour in dynamic situations.
Customization of HRTFs. Download: PPT. Distance simulation Humans perform a set of rather complex processes for estimating the distance of a sound source [ 66 ]. Table 1. Summary of the distance effects in the 3D-Tune-In Toolkit, indicating in which section the distance effect is described.
Fig 3. Air absorption as a function of frequency for 50 metres distance obtained from ISO and our proposed two cascaded second order Butterworth filter. Fig 4. Fig 5. Wefers [ 70 ] for moving sources. Anechoic path HRIR selection and interpolation. Fig 6. Example of barycentric interpolation in the HRTF sphere surface for the left ear. ITD simulation. Fig 9. Behavior of the samples in a frame during the stretching algorithm for adding a delay.
Near-field HRIR correction. Low-quality high-performance mode The 3DTI Toolkit also implements a low-quality, high-performance mode, which becomes of use when, for example, computational resources are limited e. The process is shown in more detail in Fig Directionality The 3DTI Toolkit includes the simulation of two microphones with variable directional patterns located each on one of the listeners ears. By default, a cardioid pattern is implemented: 10 This attenuation is directly applied to the anechoic path, depending on the direction of the source.
Evaluation In order to evaluate how well the 3DTI Toolkit performs, we have conducted a series of tests which are presented in this Section. Fig Performance of the anechoic process depending on the frame size. Performance of the reverberation process depending on BRIR length.
We have recorded the system output signal during the 20 minutes, and then computed the Spectral Difference SD between the interpolated versions Y HRIRinterpolated f and the original one Y HRIRoriginal f , as follows: 12 Results are shown in Fig 14 for left and right channels and for the three locations. Evaluation of the reduction of non-linear artefacts As widely described above, the 3DTI Toolkit supports real-time 3D audio spatialisation for moving sources and listener.
We also define the total energy through 14 Then, for the case of our three tones, the EoB, in percentage, is computed as: 15 EoB was calculated for each combination of distance and speed, giving the results in Fig 15 , which shows the EoB for different source distances and angular speeds: Fig 15a and 15b display the EoB, for the left and right ear respectively, when the no-ITD HRTF is used, but without any further ITD processing.
Energy out of band produced by the spatialization process for different configurations. Difference filters implemented for near-field HRIR correction. Discussion and comparison with existing tools There are many tools available for binaural spatialisation, both commercial and open-source.
Table 2. Comparison of the 3DTI Toolkit with the most popular open-source audio spatialisation tools. Table 3. Comparison of the 3DTI Toolkit with the most popular closed-source audio spatialisation tools. Spatialisation There are several approaches for rendering the direct and reflected paths when doing binaural spatialisation. Anechoic path. There are three main approaches for rendering the direct path: HRTF. Different approaches to perform the interpolation are employed. In this case the spatial resolution of the HRTF measurement is very relevant.
Sound sources are encoded into a set of Ambisonic channels, which are subsequently decoded into a set of virtual loudspeakers. Finally, those virtual loudspeakers are spatialised as static virtual sources by convolving their respective signals with the corresponding HRIR, which can be obtained from an HRTF. Notice that different Ambisonic orders can be used, allowing for a variable level of spatial resolution. Typically, renderers can be configured to use up to 3 rd order Ambisonic 16 channels.
Using higher order Ambisonic results in higher spatial resolution, at a higher computational cost. These filters are designed according to mathematical models of sound propagation around a rigid spherical head e. Typically, this approach uses low order IIR filters which are able to capture the ILDs of a spherical head, and allow to process the signals very fast and at low computational costs, at the expense of lower spatialisation quality. Real-time interpolation. Most available tools can simulate reverberation, employing two main approaches: Convolution-based reverberation CBR.
Impulse responses of the environment to be simulated are convolved with the audio signal. Usually, these impulse responses are binaurally registered using a dummy head microphone, with the sources placed at different locations. This allows for a certain level of spatialisation of the reverberation sound.
The main problem of this approach is the computational cost, due to the fact that these impulse responses can be very long. Therefore, different approaches are used to make the process more efficient. Synthetic reverberation. The response of the room can be simulated synthetically using several different approaches, which can be classified into two categories: Ray tracing, which is normally used only for early reflections and can handle rooms with arbitrary geometry.
These approaches work for late reverberation as well, and are able to simulate simplified geometries, as a shoe-box room. Alternative spatialisation modes.
Listener and sound models. Distance simulation. HRTF import. Customizable ITD and near-field correction. References 1. Blauert J. Spatial hearing: the psychophysics of human sound localization.
MIT press; Hearing Journal. View Article Google Scholar 3. Proceedings of the Audio Engineering Society Convention. On the variation of interaural time differences with frequency. Journal of the Acoustical Society of America. View Article Google Scholar 5. Rayleigh L. On our perception of sound direction. View Article Google Scholar 6. Collins P. Theatrophone: the 19th-century iPod. New Scientist. View Article Google Scholar 7. Bauck J, Cooper DH. Generalized transaural stereo and applications.
Journal of the Audio Engineering Society. View Article Google Scholar 8. Algazi V, Duda R. Headphone-based spatial sound. View Article Google Scholar 9. The Journal of the Acoustical Society of America. View Article Google Scholar Warusfel O. Listen HRTF database. Begault DR. The cipic hrtf database. IEEE; Dataset of head-related transfer functions measured with a circular loudspeaker array.
Acoustical Science and Technology. A binaural room impulse response database for the evaluation of dereverberation algorithms. Audio H. The History of Binaural Audio; Paul S. Binaural recording technology: A historical review and possible future developments. Acta acustica united with Acustica. Fundamentals of binaural technology.
Applied acoustics. Binaural technique—Basic methods for recording, synthesis, and reproduction. In: Communication acoustics. Springer; Techniques and applications for binaural sound manipulation. The International Journal of Aviation Psychology. Direct comparison of the impact of head tracking, reverberation, and individualized head-related transfer functions on the spatial perception of a virtual speech source. Bronkhorst AW, Houtgast T. Auditory distance perception in rooms. Hartmann WM, Wittenberg A.
On the externalization of sound images. Kim SM, Choi W. On the externalization of virtual sound images in headphone reproduction: A Wiener filter approach. Wallach H. On sound localization. In: Audio Engineering Society Convention Audio Engineering Society; Auditory localization of nearby sources. Head-related transfer functions. Localization of a broadband source. Brungart DS.
0コメント