Kpopalypse’s music theory class for dumbass k-pop fans: part 15 – compression

Kpopalypse is back with another music theory class to help you deeper understand the k-pops, as well as music in general!  This time we’re talking all about the most important processing effect in all popular music – compression!

Did I mention that compression is the most important processing effect in all popular music?  Well, it is!  Almost everything that you hear today in music is compressed, but what is compression exactly?  Why would you use it, and why do people use it so often?  Where is it best used in pop music?  Also, where wouldn’t you use it?  Did Elon Musk tweet about potentially dying under mysterious circumstances just now because he saw Chuu sneaking around with an axe out of the corner of his eye when out getting bread and milk?  Most of these questions will be answered in this post!

So what’s compression?  The basics are actually very simple – compression is equalisation of volume, making everything “more equally loud”, but what does this mean in real terms?  The best way to explain what compression does is to talk about movies on television, and advertising on television.  Have you ever watched a movie on free-to-air TV, and some of the dialogue seemed a little quiet and whispery, so you turned up the volume nice and loud to hear the voices clearly, and then an advert came on and you get blasted with hard volume up your ass?  Did you ever wonder why the advert is so much louder than the movie, and why the television station is allowed to get away with blasting the ads at you so much louder, like are the advertising company paying a premium for the privilege of deafening you, or what?  We can get our answer to this question if we compare the waveform of an entire movie, to the waveform of an entire advert.

The above is an audio waveform of an entire feature film, with time displayed horizontally and volume displayed vertically (and mixed down to mono plus nudged up to digital zero just to make the diagram look neater).  It could be the audio of any film, but in this case I’ve chosen The Matrix (1999), because it’s a quite good quality action film that you’ve probably all seen, so there’s a reasonable chance you’ll be familiar with the general flow of events in the film.  Also, The Matrix is an action film and action films all tend to follow a fairly specific kind of “dynamic pattern”, so even if you haven’t seen The Matrix (and you should, I mean seriously, even if you don’t like action films) you’ll still recognise the general pattern of events I’m about to point out here.  The film is just over two hours long as indicated by the time strip up the top.  The first few minutes start off with a flurry of action to immediately hook in the viewer, which you can see as volume spikes due to the very loud action sound effects.  Then the film calms down a bit as it starts telling the actual story, all those fairly low level bits of blue are the talking.  This talking is of course interspersed with more action scenes represented by the peaks in volume, which are roughly evenly spaced to maintain interest.  From just after 1 hour and 40 minutes, we approach the big climax of the film and it’s all pretty much action from there until the very end.  So as we can see, there’s a very big volume difference between the talking, and the sound effects during the action scenes.  This is completely deliberate, it’s to give those action scenes more dynamic impact when they appear.  If people started shooting guns and blowing things up and it wasn’t all that much louder than the bits in the film where they’re just standing around talking about Lacanian psychoanalytic theory, it would seem weird, kind of unexciting.  They want you to turn up the volume when it’s quiet, so when things get loud, they get really loud.

So let’s now compare that to the sonic waveform of a TV commercial.

This could be any TV commercial, but I’ve taken this audio from the original Shamwow commercial, once again just because it’s probably something that you’re familiar with if you’ve spent any time watching late night infomercials, or if you’re a reasonably online sort of person in general.  We’re mostly pretty familiar with Vince Offer and his style of infomercial delivery.

The majority of the commercial is just the narrator talking, and what you’ll notice is that apart from one little hump at the 15 second mark (which I suspect might be a YouTube transfer glitch), the volume of his voice is perfectly equalised, he is “equally loud” all the way through.  The peak volume of Vince from the Shamwow commercial and The Matrix are actually about the same, but the feature film hits that peak a lot less often, whereas Vince’s voice has been compressed much harder, so he’s hitting that high peak volume consistently all the way through the advert.  Unlike a feature film, an advertisement usually isn’t interested in light and shade – they want to demand your attention constantly, at all times during the advert.  So the advert isn’t actually louder, it’s just at its peak volume level for a much higher percentage of its running length.

Let’s compare some more examples, this time musical ones.

The above waveform is the YouTube audio from Hauser’s rendition of Bach’s “Air On A G String”.

This isn’t actually a very dynamic rendition of Bach, as far as renditions of Bach go, anyway – a lot of classical music has more volume variance than this does.  However the difference between classical and pop music mixing norms and loudness levels will become immediately apparent to you if we contrast the waveform above with that of a modern k-pop song below.

Holy shit, now this is what we call in the audio engineering trade a “brickwalled” mix.  Keep in mind that neither this track nor the Hauser track have had any volume modifications by me, I didn’t need to nudge either track to zero to make the diagram clearer, these waveforms have been mixed to mono and have had no other changes.  Of course, what you are looking at here is the audio for Red Velvet’s “Feel My Rhythm”.

The only parts with dynamic range here are the intro and outro where it’s just sound effects, and the song’s intro before the drums kick in is a little quieter than the rest – as soon as the “full arrangement” starts at about the 37 second mark, it’s brickwalling the mix all the way along digital zero with almost no let-up apart from a very, very slight turning down of the volume for the breakdowns that come in at the last third of the song.  Never mind that samples from “Air On A G String” are actually in “Feel My Rhythm”, it doesn’t matter, there’s so much other sound also going on that it’s enough to shoot the volume levels right up to the maximum and keep them there, almost all the way through.

Think it’s just Red Velvet, or just SM’s producers?  Here’s the waveform for Blackpink’s “How You Like That”.

The approach to compression is much the same.  As Blackpink’s production is sparser (less layers of sound at once) it’s a tiny bit more jagged looking – but not much, with the volume differential between the chorus and verses being very slight.  You can look at this waveform and actually clearly see the arrangement of the song, but it’s still hitting digital zero all the way through after the 42 second mark.

A couple more just for fun:

Given the very percussive nature of the track, you might have expected Billlie’s “GingaMingaYo” to not be quite as peaky, but if you expected that, you expected wrong.  The quieter parts at the start and end are once again intros and outros that aren’t part of the song itself.  You might be wondering if everything is compressed this hard.  What about a song with smoother sounds that’s not so shouty and crash-bangy?

Le Sserafim’s “Fearless” sure sounds pretty different to Billlie’s song, but in terms of how the audio has been treated, it’s much the same.

Hong Jin Young’s “Cheer Up” is a mellower song again but is actually even more dynamically squeezed.  So much for the idea of high level compression only being used on “modern sounding” pop songs.  Note the extended drama section at the end of the video is much quieter. 

How much softer can we get?  IU’s “Through The Night” has got to be one of the softest songs that has come out of Korea’s idol pop world over the last decade, but you can see that once it gets going, there’s still a pretty flat dynamic range happening.  Nothing that we’ve seen in k-pop, across all these different styles, has as much dynamics going on as the classical piece.  So, why is that?

The main reason why pop music mixes have high volume levels, is in the word “pop” – these songs are supposed to be “popular”, therefore they are competing.  Imagine that you’re someone who had just written, composed, produced, or marketed a song (maybe even all four, like a JYP type).   If someone puts on a playlist of ten or fifteen songs, and the song that you want to sell is in that playlist but quieter than the songs before and after it, that’s not a good situation for you as someone who might like to sell your song.  The same applies if your song comes on the radio, you don’t want it to be quieter than the competition, you want your song to be noticeable and stand out.  Sure, you could take a gamble that having something quieter might draw attention against all the digitally maxed stuff around it… but that’s one hell of a gamble, depending on where and how your song is being listened to, there’s just as likely a chance that your song being quieter might cause it to be go by unnoticed, or ignored completely – or even worse, give more attention to the song that plays immediately after it when the listener hears the volume ramp back up again, which is definitely not what you want to happen. 

Producers competing in this way has given rise to the term “war of loudness” as they jostle with each other to have the loudest, most impactful song.  However this war has a finite limit – on digital systems, you can’t exceed digital zero (or “clip” the signal) on your final master without producing really horrible distortion that frankly sucks to listen to in all but the most bizarre and experimental of contexts.  There are ways around this limitation which I’ll get to, but for now let’s oversimplify things and just say that digital zero is the ceiling that you can’t go above (which is still true, usually, kind of).

So how do you, as a producer or dabbler in sound of some form, harness compression for your use?  When we’re thinking about how compression acts upon a signal, the best way to conceptualise it is with a graph.  Here’s a graph, which shows the operation of our compression device.

Our plan is to feed our input signal through this compression device, the device will work its magic, and on the other end we’ll get a compressed output signal.  Volume level of the input signal is the X axis, resulting volume of the output is the Y axis.

It will help us a lot if we put a grid and some numbers on this graph, so let’s do that.  The numbers are abstract here and don’t represent actual dB levels, but they will help.

Now, when the compressor is off, not changing the signal in any way but just passing the input straight to the output unaffected, then we have a 1:1 ratio.  An input of one, will equal an output of one.  We can represent this on the graph with a diagonal line.

On this graph, we can visualise the signal travelling from the bottom of the graph (input), until it hits the diagonal line, then taking a 90 degree left turn until it hits the left side of the graph (output).  But rather than visualise in our minds, we have the wonderful technology of MS Paint to help us, so let’s actually draw this on the graph.

Here we can see our signal travel until it hits the line, and then proceeds to the output.  As the compressor is not acting upon the signal, an input of 2 therefore results in an output of 2.  If the input happened to be something else, like 3, it would result in an output of 3, like this.

This is all very straightforward so far.  What a compressor does, as far as this graph is concerned, is it allows you to move the red line.  Let’s turn the compressor on, and compress everything with a 2:1 ratio – this means that for every unit of signal that we put in, we’re going to get half that amount back.

Now each output is halved when it hits the compressor.  An input of 2 becomes an output of 1, an input of 3 becomes an output of 1.5.

This is a very basic use of compression just for example, but isn’t a very practical real-world application of compression in a recording environment, simply because we don’t usually want to quieten all the sound coming in, we just want to reduce the volume of the very loudest sounds (and then turn the entire thing up later, so it’s all more “equally loud” as discussed above).

The first thing we want to do is set a threshold, which is the volume level above which which compression starts acting on the signal, because we usually don’t want to compress very quiet sounds that are already hard to hear.  If we set a threshold of 1, that means that any signal below 1 won’t be compressed at all, but any signal that’s above 1 will be compressed.  If we set our compressor this way while keeping the same 2:1 ratio for sounds above the threshold, now our line will look like this:

Another consideration is that we need to consider how much output volume level is simply too much, under any circumstances.  What if we figured out that any output of over 2 would put us at digital zero, resulting in clipping which we’d rather avoid?  Then with your compressor you can also set a limit at which volume cannot go beyond.  “Hard limiting” is compression at an :1 ratio.  If we set a threshold at 1, a 2:1 ration for any sounds above 1, and a hard limit :1 ratio at 2, our line now looks like this:

Signals up to and including 1 aren’t altered, but anything between 1 and 3 will be squashed by a 2:1 ratio, and anything at 3 or beyond isn’t allowed to have an output level above 2, regardless of input level.  So in the below diagram the quiet sound at 1 isn’t changed, the sound at level 2 becomes 1.5 at output, and the loud sound at 3 and the very loud sound at 4 both result in an output of 2.

The more extreme the compression ratio, the more effect it has on the signal.  2:1 is fairly subtle and unlikely to be noticed to anyone without trained ears, but with most pop songs you’re almost always talking about a much more extreme ratio than this, certainly the ultra-brickwalled mixes in the k-pop examples in the first part of this post were certainly generated with much more extreme compression ratios than 2:1.  Another factor to consider is that while I’ve used very precise ratios for these examples, a real world example would also be much more likely to use “soft-knee” compression, where the actual numbers are a but fuzzier and the transitions between ratio levels are smoothed over, which makes the compression kicking in a little less obvious to the casual listener.  The above example of a threshold at 1 and a hard limit at 2 in “soft knee” compression style could look something like this:

Most compressors allow you to adjust the “knee softness” anywhere from Hwayoung to Gfriend.

It’s obviously possible to compress an entire recording, but when recording a pop song it’s actually just as likely that individual parts have also been compressed independently.  The distortion channel on an electric guitar amp (or any distortion pedal) acts as a compressor/limiter, once you put a certain amount of signal into it, instead of the result getting louder, the signal breaks up instead.  Drum tracks are often compressed to even out volume levels, because some drummers aren’t always consistent with hitting the drums at the same volume level each time.  Bass guitar compression is so common that many bass amplifier systems have adjustable compressors hard-wired in to the circuit.  However the big place where compression is pretty much always used, and never not-used, is vocals.  That “stability” that vocal analysts love to harp on about is actually often at least partially a function of compression units doing their work to smooth out the volume discrepancies in the human voice.  

K-pop singers like Ailee who can actually sing (rare) and understand microphones (rarer), may or may not understand any of the above details about how compression units work, but they certainly do understand that having a consistent volume is desirable.  Watch Ailee in the above video at 1:48, where she’s belting out strong notes and tilting her head way back from the microphone, and then watch her again at 3:22 where she’s singing softly and moving her lips right up to the mic.  When Ailee makes these gain-riding adjustments she is trying to do exactly the same thing that a compression unit does – equalise her volume.  Her vocals are no doubt also electronically compressed anyway, but the fact that Ailee has some clue about what she’s doing means that the machine doesn’t have to work so hard, and the two in combination together can produce a very smooth result.

Now watch the SHINee video above at 2:12 – the guy is practically eating the microphone while belting out a big vocal line.  He sounds okay enough anyway, because the compression and limiting on his microphone will still stabilise him at more or less the right volume, but whoever was setting the controls for SHINee would have to do it in such a way to account for the fact that the guys in the group either are unaware of how to gain-ride their vocals, or just would rather not, for whatever reason.

Of course a lot of k-pop performers wear headset mics and then they have no choice, they can’t manually alter microphone distance on the fly and just have to rely on compression to save their ass.  Mind you, the backing track has their own voice on it anyway, so any volume discrepancies will be evened out a fair bit just by the presence of a consistent second vocal line existing behind the actual live vocal.

Compression is unlike a lot of other vocal effects, in that it’s an effect that is designed specifically to not be that noticeable – if it’s working well, things just “sound right”.  On studio recordings, vocal compression is always there, everything you hear on a k-pop studio recording is carefully smoothed over.  The rare exceptions are only ever with groups on the very, very nugu end of the spectrum with zero budgets.  So let’s look at one of these rare exceptions, so we can hear what compression sounds like when it isn’t working as intended.

Listen to the vocal from 0:48 through to 0:55, and how it gets oddly quieter as the singer progresses towards the end of her line, her volume is everywhere and she sort of vanishes behind the backing track.  The girl who sings immediately after her has a similar issue, but not as pronounced.  They might not be the greatest singers in the studio, but the real blame here doesn’t lie with them, but with the engineer who clearly wasn’t able to fix it in the mix.  I very much doubt that weird fading vocal was their intention, this is an example of either a compressor not being used at all, or not having a strong enough ratio dialed in to compensate for the singer’s vocal discrepancies, maybe they had 2:1 on but they needed something like 10:1 to catch all those notes, or perhaps the threshold wasn’t set properly below where the vocals actually were. 

Sometimes compression isn’t used, deliberately.  In this case, they’ve left it off Luna’s voice, so she can go very quiet in the opening and closing parts of the song without the machine automatically boosting her volume up, this allows her to match the dynamics of the players in Jambinai more accurately as they deliberately also play softly in some sections.  As the musicians around her increase their volume, Luna matches them with her voice.  Note that in the peaks of the song she’s still pulling her head away from the microphone as Ailee did, so she doesn’t overload the gain.  Luna, like Ailee, is a rare k-pop singer who actually knows what the fuck she’s doing.  There really aren’t many of these in the idol world.

A final word on clipping – while a final master should never be clipped, it’s possible with 32-bit recording software to clip your original instrument recordings into the mixdown phase and still get a final result that sounds good, this is because 32-bit audio actually stores information above digital zero (essentially making it function more like an analog recording device in this aspect), whereas 24-bit and lower systems just cut this information off completely.  Clipping the desk can therefore sometimes work better than compression for things like recording drum tracks or other transient peak sounds without adding so much normalisation that it turns everything into a complete wash.  If you’re interested in this topic, here’s some videos:

Note that it’s “horses for courses” of course.  You wouldn’t do this if you were recording a chamber orchestra or a sweet ballad, but for something like dubstep or some downtuned chug-chug music style with “core” at the end of its genre name, the results can be surprisingly good.  Use at your discretion.

That’s all for this post!  The music theory series will return!