Ajin Tom

Audio Technology Researcher | Sound Engineer | Composer | Pianist

This page includes audio demos associated to my Masters Dissertation on Automatic Mixing Systems for Multitrack Spatialization based on Unmasking Properties and Directivity Patterns.

Headphones recommended…


Demos - Chapter 4 : Approach 1
Spatialization based on sinusoidal panning filters

 Link to listening test:

http://webaudio.gutech.edu.om/test.html?url=tests%2Fpantest1.xml

Download link to all the audio samples is available at the end of this webpage.


Here is my paper associated to the above listening test, published at the 146th Audio Engineering Society Convention 2019 (DSP Track) - Dublin, Ireland:

An automatic mixing system for multitrack spatialization for stereo based on unmasking & panning practices

Designed and developed a novel automatic mixing algorithm to spatialize sound sources of a multitrack recording for masking minimization.

Utilized time-frequency analysis techniques: spectral decomposition, sub-grouping, frequency based spreading, particle swarm optimization based on MPEG Psychoacoustic Masking Model.

Link to publication: https://secure.aes.org/forum/pubs/conventions/?elib=20311

AES: http://www.aes.org/events/146/presenters/?ID=8018


Demos - Chapter 5 : Approach 2
Spatialization based on source directivity

Simple example illustrating evolution of frequency responses over time of a source rotating about its centre

Violin excerpt played twice:

1) original track,
2) output track of listener at (0,20) and source rotating about the origin (0,0). Angle of the source with respect to listener evolves from 0 degrees to 180 degrees.

At 0 degrees (direction of maximum power), we observe a flat response and at 90 degrees we perceive the directivity effect (low-pass structures of the template directivity equation 5.10 in section 5.2.2).

This example can be thought of as a scenario in which a violin player rotating abut their axis while performing. (0 degrees - facing the listener, 90 degrees - facing away from the listener).


Quick Demo:

Mono sum of input tracks (left) and Binaural mix optimized using source directivity (right):


Full demo:

The above audio demo illustrates the effect of using source directivity as features to carry out automatic spatialization. For each genre, the mono sum of input tracks is played first, followed by directivity-optimized binaural auto-mix. Various spatialization tools are used to visualize the stereo activity. We now continue the discussion from Section 5.2.2, discussing each of the 6 auto-mixed audio excerpts in terms of spatialization extent:

1. Acoustic (00:26)
In this excerpt the automix system placed the banjo and violin tracks (these two tracks were recorded as monophonic tracks in the recording) on either ends of the sound field, due to inter-track masking; both the violin and banjo played the lead tune simultaneously throughout the entire piece. We can hear the low-pass effects due to directivity alternating between these 2 tracks (at around 00:30 the violin track gets low-pass filtered thus we can hear the banjo track more distinctly, and vice-versa at around 00:40). The rhythm and bass guitar tracks were recorded using a stereo pair of microphones, so they were summed to the mix without any processing. 

2. Country (01:26)
The vocals and drum tracks were not intended to be panned, hence were not processed by the automix system. The bass guitar track occupied the centre of the stereo field as in most cases; since it does not end up masking the rest of the mix beyond 200Hz. The rhythm guitar was assigned a position on the right and the guitar playing the tune on the left. We can barely perceive any time-varying effects due to directivity in this excerpt. This is possibly due to absence of masking effects between the two guitar tracks; the lead tune track would be occupy a narrower band in the spectrum as compared to the rhythm track. 

3. Funk (02:36)
This excerpt ended up being a poor example since the perceived spatialization effects were subtle, possible due to minimal multitrack masking. 

4. Jazz (03:36)
This dense multitrack proved to be the best example for the spatialization due to directivity approach. The double bass and electric piano tracks were assigned slightly panned positions in the field by the system. The brass and woodwind instruments, more widely panned, seemed to be competing in the mix across time. At around 03:42 the brass track on the left seemed to undergo low-pass effects due to directivity while the woodwind section on the right cut through the mix. At around 03:52, the opposite effect was observed in which brass track had a flat response while the woodwind section had low-pass effects. 

5. Pop (04:30)
Another dense multitrack which proved to be a good example. It is difficult to hear the time-varying effects in this example. However, compared to the mono sum of tracks, the automix achieves a great sense of spatialization with sources widely spread across the sound field. 

6. Rockballad (05:42)
This multitrack is relatively dense with piano and guitar with slight (crunch) distortion effects, vocal backing, organ, bass and drum tracks. As usual, the drum tracks were left unprocessed and the bass track occupied the centre of the stereo field. The 2 distorted tracks occupied opposite sides considering overlapping spectral content, with piano on the right and distorted guitar on the left. The organ ended up spectrally masking the vocal backing tracks, hence we can hear time-varying low-pass effects and they end up competing in the mix at several instants (at around 05:52 and 05:59 we clearly hear the vocals while the organ is low-passed; vice-versa after 06:00).

In general, from informal listening we observe that the proposed automix system carries out significant amount of spatialization. In this approach of using directivity and spreading the sources across the sound stage, we hear binaural effects and achieve good amount of externalization, thus making the mix sound natural. We can localize the sources to a good extent and hear them with high clarity. However, in some cases the PSO assigns extreme locations to spectrally dense sources thus making the mix sound extremely wide. In a dense multitrack with more than five tracks, it is difficult to perceive time-varying  directivity effects; we perceive them more like level variations. In other cases, it is easier to observe the changes in source directivities. We perceive the effects as low-pass filters whose cut-offs vary over time. Overall, the proposed automix system performs well in terms of unmasking and spatialization. We can conclude that using source directivity helps create a plausible recreation of sound stage; it is also an innovative tool to produce spatial effects.