There is something missing at the heart of the listening component in most ELT course materials. They fail to dig deep into the actual raw material of the skill – what Richard Cauldwell calls the ‘sound substance’. The job of the listener in decoding from this sound substance is radically different from the job of the reader in decoding from print. In print, the words are clearly separated from one another. Word families are obvious, for example the common element in sign and signature is clearly visible. Sentences are usually well formed – drafting phenomena such as false starts having been ironed out before publication. And the reader always has the option of revisiting passages which have caused difficulty. When decoding from natural spoken English, the listener has none of these advantages. The sound substance is a continuous stream, usually without word breaks and full of drafting phenomena and non-standard grammar – and all of this delivered in an accent of lesser or greater familiarity.

The reason that most course materials fail to dig deep into this sound substance is probably that listening lessons are usually written before the audio actually exists. The authors usually work from a script of the audio – often written by the authors themselves to be recorded by actors at a later stage. The consequence of this is that the authors cannot pay attention to the sound substance. They may be able to ‘hear it in their head’, but there is no guarantee that the actors will say it that way so they can’t focus on it in the materials. Working from the script means that authors are essentially writing a reading exercise. The decoding difficulties which are specific to listening are not broached.

In order to deal with decoding speech, materials writers need to reverse the order of work. We must find or create the audio first, and then write the lesson around it afterwards, based on listening very closely to the precise sound substance of our piece of audio. By working with pre-existing audio, authors also have the opportunity to micro-edit the sound files to highlight specific features of the sound substance. It is this exciting possibility which I want to focus on in this post, and in particular, the two types of audio-edit mentioned in the title.

An acoustic drill is a multiple repetition of a small fragment of audio which is likely to be of difficulty. For instance, you cut the following segment out of your audio:

... it was kind of like, you know, sort of like...

You then copy and paste it several times over into a new audio file. You may also make a slowed down version of it. The effect of hearing a small snippet like this in close repetition is odd. It is somehow mesmeric, automatically focusing the listeners attention to the quality of the sound substance. This is something we don’t normally do in native listening – the sound substance goes in the ear, but the mind focuses on the meaning and not the sound. But for those of us who are learning listening, we must focus on the sound sometimes. Acoustic drills are particularly useful for snippets of language which are typical of natural speech, such as the vague language in the example above.

An audio concordance is an audio consisting of multiple instances of the same feature from many different source audios.

... for sort of the presentation...

... lots of nice sort of pockets of ...

... people are sort of crowding round it...

... and we tend to sort of queue up...


Here, we see the very common spoken discourse marker sort of, said by many different speakers, with a little bit of the co-text before and after. Discourse markers like this are very often pronounced very weakly, in a throw-away manner. This makes them difficult to hear. The learner realises there’s something there which they’ve missed, leading to anxiety which is unnecessary – the discourse marker is not essential to the meaning of what the speaker is saying. The audio concordance can help to make this piece of unfamiliar noise in the sound substance more familiar so that the learner listener does not get hung up about it. Check out a useful post on micro-listening and audio concordances from Mura Nava here.

Hi Mark Thanks for the post. I have found audipo listening app [] great for looping audio which is analagous to your audio drills; and videogrep [] is good for audio concordances (which I think falls under John Field's category of micro-listenings) There are a number of options now available for speech corpora [] I was wondering if you knew of any researchers in this area such as Phoebe M.S. Lin [ ta mura
Many thanks for these useful links Mura. I'm including your micro-listening post into the post above. Annie and I did it the hard way, manually, and that has benefits too, if what you're doing is pedagogical invention rather than linguistic research. I'll keep on the lookout for more research articles on prosody of formulaic language.

Hello Mark.Thank you for sharing your article. I couldn't agree more with the point you make. Sometimes,itis funny to listen to the recordings that accompany some texts. The authors include the key to the tones they mean to illustrate but the audio does not coincide with the scripted dialogues. In that case, the listener/reader has to take the extra trouble of decoding the audio before delivering it to a less learned audience. It is hard. Cheers, Stella
Hi Stella Do you mean the tone doesn't seem to match the audio in intonation manuals? A lot of teachers complain that they can't actually hear the intonation. One problem may be that a fall-rise often has a lot of fall and only the slightest hint of a rise, nothing more than a slight flick in the tail; something you intuit more than hear. I think acoustic drills help with this - the repetition becomes more than the sum of its parts, somehow.

