Kpopalypse’s music theory class for dumbass k-pop fans: part 11 – mixing sound in four dimensions

It’s the return of the Kpopalypse music theory class!

Most of what’s been present in the Kpopalypse music theory series so far has been fairly traditional, in terms of teaching the fundamentals of notes, sound, harmony, texture, rhythm, etc. as it applies to pop music.  This has been necessary just to define the basics.  However as we go deeper into this series, we will gradually touch on stuff that isn’t part of any standard curriculum, and in some cases stuff that is either against common music theory practice or simply just isn’t taught anywhere.  After all this is Kpopalypse’s music theory class, and there would be no point in doing these posts at all if it was just stuff you could read anywhere from a textbook!

So study of music theory as a whole tends to concern itself with lines on a page a lot, but the reality is that a lot of people don’t actually write music like that anymore.  That’s not to say people don’t do it ever, in fact it’s still quite common, however it’s just not the only option, or even the most common, and certainly not the most common in the realm of the k-pops, which is what we give a shit about for this series.  Most composers these days working in the k-pop music realm are composer-producers, and usually they just write straight to DAWs (Digital Audio Workstations, essentially an electronic mixing desk) using MIDI interface keyboards and/or software, while also adding live instruments where needed.  This series doesn’t really need to teach you how to use a DAW as they all work slightly differently and they all come with their own instructions, although future episodes might touch on certain very fundamental aspects of the tech.  However what probably nobody is going to teach you anywhere is how to arrange your sounds in a DAW for a piece of pop music.  With this I’m not talking about where the verse and the chorus goes, but rather where to put sounds in the sonic field, and the advice in this post actually applies equally whether you’re recording on a DAW like ProTools, or if you’re using analog desks and tape machines.

It’s worth talking about the history of multitrack recording for a moment.  The equipment for the modern multitrack recording studio was mostly invented in the 1960s and 1970s.   Recordings before this time were mostly recorded in mono – one recording source, which was a microphone in the center of the room.  You needed a very good sounding room to do this, as effects hadn’t been invented yet, and the top studios were the ones that were prized for their good-sounding rooms.  Once you had your room and your microphone all set up, you’d drag the band in, they’d all stand around this one microphone, and then you’d get them to play their hit song.  How important each band member was and how loud they sounded in the final mix was generally determined by how close or far away they were from the microphone, and these distances needed to be adjusted depending on how loud those instruments could be.  Of course vocalists had to be right up close, whereas a loud brass player or drummer had to sit further back to play so he didn’t drown out the singer’s voices.  If you’ve ever wondered why drums on every 1950s recording ever sound extremely muffled and very far away, and not at all like the crisp drum sounds of today, now you know. 

When multitrack recording came along, that changed everything.  New tape machines were invented that could store multiple recordings on one strip of tape and play them back all at the same time.  Now you could have a microphone close up on the singer, another one close up on the guitar player, another one close up on the drummer’s snare drum, and so on, and record everything at once, and then mix and adjust it all later when you “bounced down” your multitrack tape recording onto the final two track “master” tape.  Another thing you could do was record everybody separately instead of together, so the drummer could lay down his tracks on his own, then that track could be played back to the bass player later who could then add his bit (or more likely her bit, because it was probably Carol Kaye), and so on.  In any event, it allowed music to have the individual instruments treated as separate elements that could be modified to suit.

Even though this new technology existed with great flexibility, and people could now record onto 16, 32 or even 64 tracks of audio, most recordings from the 1970s almost all sound terrible by today’s standards, and that’s because mixing with regard to the sonic field wasn’t widely understood at that time.  A few people did understand this well (notably, Pink Floyd, and Led Zeppelin’s Jimmy Page) and as a result their recordings haven’t dated nearly as much as almost everything else the 1970s produced.  So let’s get into it.

It’s helpful to think of music as a four-dimensional object, in line with the classical physics model of 3D space/time.  This means we can think about sounds in terms of:

Height – also known as pitch.  How high or low are the sounds being recorded?
Width – of the stereo field.  Are the sounds to the left, the right, the middle, or somewhere in between?
Depth – distance.  How close or far away are the sounds, and hence, how loud or soft are they?
Time – music happens during a “time frame” so this fourth dimension must be added to quantify how the music changes over time.

Modern DAWs and computer software can produce a spectrogram analysis of sound, where time is the x axis, pitch is the y axis, and volume increases are shown by more intense colours. 

However this type of visualisation isn’t helpful when thinking about composing, as it’s impossible to correlate the visuals with the exact sounds being produced.  The spectrogram also doesn’t show stereo, for that you need two spectrograms side by side, and once again when looking at it it’s almost impossible to determine which exact sounds are coming from where in the stereo field. 

While scientifically accurate, the spectrogram is only useful for analysis, it isn’t compositionally functional.

A better way to think about composition is just by drawing a box.  I’ve discussed this before in the MR Removed post, so I’ll recycle the images from there.

Here’s a box, which represents our sonic field.  In this box, pitch is once again the Y axis.

Then you add stereo which is the X axis:

And then in each of the boxes you can put stuff.

The above instrumental designations are fairly typical ones.  However this is a drastic oversimplification that I used in the MR post just to illustrate the point.  Obviously I haven’t indicated volume here, and I also haven’t indicated any sort of time scale (which would mean producing a new graph each time something changes).  Furthermore, sounds aren’t usually ever simply “high, middle or deep” nor are they always panned exactly in the middle, or absolute hard left/right. However the main principle is this – sounds need room.  If you have too much stuff competing in one space, some of it won’t be heard easily.  That might be fine, if one instrument’s role is to sit way back in the mix and support the other one, but if you’ve got two equally important things in the same general area, then you might have problems with the sound of the final result clashing.

Let’s apply this thinking to a k-pop example.

Whether you love or hate their actual music, Blackpink have the most well-realised sonic production in k-pop, it’s so good that it actually carries their songs for them.  “Kill This Love” is a great example to use for introducing the concept of audio separation over time given how sparse yet effective the production is, so let’s look at just the first verse and chorus of this song to understand some of the different sonic arrangements.  From 0:16 when the first verse starts, what we’re listening to is this:

Here, font sizes have been used to indicate volume and width/position of the text indicates stereo placement.  Jennie’s vocals are more or less central, and the bass is as loud as she is, but because her voice and the bass have no real frequency overlap, they can both be heard easily even though both are loud.  The snare drum and the rest of the kit are at a lower volume level, and at an even lower level again is a single repeating keyboard note which you may not have even noticed before reading about it here, but as it exists in the background it doesn’t really push anything else out.  Even the distinction between the bass drum and the loud synthy bass subs is very clear because the subs have been pushed to the sides whereas the bass drum hits in the center.  This is right now at this point in the song a very sparse and roomy mix.

When Lisa comes in at 0:30, the sound changes just slightly:

An extra layer has been added, those weird popcorn noises or whatever the fuck they are.  They sit fairly high in the mix across most of the stereo field and don’t really intrude on anything else too much, once again it’s a detail that probably registers subliminally for most listeners.

When it’s Jisoo’s turn things change a lot more.

Jisoo takes the pre-chorus breakdown and is literally surrounded by keys, with the previous bass and drum patterns being gone.  The only other additions being a handclap to mark the beat and some very subtle percussion which isn’t super-quiet but also doesn’t happen that often.

From 1:00 we have the pre-chorus build that goes into the chorus, which Rose sings.  This section has a lot more going on.

Rose’s voice I’ve positioned a bit higher here as she has the soaring high vocal.  The snare drum oscillates, sometimes it hits in the middle of the stereo field, at other times around the edges, in headphones you’ll be able to hear it moving around and it adds to the chaotic feel of this section of the song, while also conveniently giving a bit more room for Rose’s vocal so there’s less clash.  The weird flute-type noise is hard to discern easily but again it’s mainly just there to add a feeling of tension before the chorus kicks in and resolves it all.

The backing vocals are LOUD but they’re also pushed off to the sides, and the big brass riff is sort of spread across as well, giving ample room for the “Rum pum pum” part to occupy the middle undisturbed even though the frequencies overlap somewhat and everything is loud.  This is a key point to remember when placing instruments – if you’ve got two that are fighting for room, giving them their own position in the stereo field can allow both to breathe.

Throughout the entire song, everything can be heard very clearly despite the vast array of different sounds simply because everything has its space, in space/time, pitch frequency and stereo location.  What you may have noticed by going through the above is that as each part of the video has a “scene”, the music changes along with this with a new “sonic scene” to match the video’s “visual scene”.  Visuals have been selected to match the sounds, with Jisoo’s calm and Rose’s chaos contrasting with Jennie and Lisa’s more matter-of-fact brashness.  The video contents are designed to be absolutely complimentary to the music.

The above is a simple example.  It gets a lot more complicated when looking at a song with denser layers.

I’m not going to go through all the steps this song takes because it would take far too long and this post is long enough.  However for most of the song, here’s our breakdown of instrumental and vocal layering:

That’s pretty busy and it’s about as busy as k-pop gets, a very “full” sounding mix.  Some points to note:

  • Blackpink’s approach to bass is unusual – usually bass drum and bass guitar are both central, as they are in Apink’s song.  This is because bass tends to have an omnidirectional effect anyway, it tends to go through everything (unlike high frequencies that are easily reflected) so there’s little point panning it off most of the time, but Blackpink probably did it simply because it fills out the edges more as the song is so sparse, Apink’s song is busier in general with a lot more action at the edges so it doesn’t also need a wide bass sound. 
  • The vocals and the main keys riff in Apink’s song occupy exactly the same space but don’t get in the way of each other because they happen at different times.  It’s okay to have something besides vocals occupying that big front-and-center zone as well, if they take turns.  If you have two instruments that need to be equally loud and are in the same range (deep vocals and crunchy guitar in a guitar based song for instance) then pan the instrument out to the edges to make room for the vocals.
  • While panning to the edges is common, panning to absolute hard left and right is rarer, because sounds that are completely only in one ear without any information in the other ear at all can be grating to listen to on headphones.  In cases of an exact double-track, an absolute pan can work as the two parts will equalise each other somewhat, but if the parts are unique it’s best to stop just before you hit the edge, as a bit of information on the other side softens the harsh separation effect.  It also means that if someone’s listening to a song in a car stereo with one of the speakers blown (as I often did in my youth) they won’t miss out on one musical part completely.

Of course, you and I both hate it when people who are into k-pop write annoying theory analysis posts just as another way to show everyone “look how great my faves are”.  There’s no point learning what works without also learning what doesn’t, so to finish off, let’s look at a song with poor mixing so we can see easily what went wrong.

The issue with Rui’s song from a mixing perspective is that almost everything in the midrange has been pushed out to the side, including the main vocal which is double tracked and the tracks panned separately to the edges.  This creates a lot of clashing with the overly busy keyboard and synth lead parts, these parts are constantly doing something and leave no space for each other, or for the vocals.  There’s too much stuff bunched up together, and a lot of room elsewhere in the mix that isn’t being used at all.  This is partly why the song sounds so claustrophobic and sea-sickening to listen to.


That’s all for this post – the Kpopalypse music theory series will return!

7 thoughts on “Kpopalypse’s music theory class for dumbass k-pop fans: part 11 – mixing sound in four dimensions

  1. Thanks so much for writing these; the topic is really interesting, and your humor shines through. By far the best and most unique aspect of Kpopalypse blog.

  2. Interesting read, thanks for posting. The mixing on IZ*ONE’s Japanese releases Vampire and Buenos Aires has been described as bad, is it because of crowding the sonic field with too much competing for the same space, or is something else going on there?

    • I listened to Vampire when it’s fresh from release and my immediate thought at the 1st verse was “Can these girls get any more LOUD?” So I guess it’s with the vocal EQ or something.

Leave a reply, cao ni ma

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.