The Ear, the Eye, and the Arm: Audio Writing

The past couple of years, podcasts and forms of audio entertainment have exploded in the media world. According to The Infinite Dial 2022 report from Edison Research, the media industry-wide resource, an estimated 62% of all Americans over 12 years old have listened to a podcast. Also referred to as “spoken word audio,” this content is a catch-all term that includes music, talk radio, news, longform narrative, journalism, true crime, audiobooks, and of course audio drama. How did this happen? Well, when you have a global pandemic, nothing signifies human connection faster and more regularly than your favorite hosts on a podcast.

Audio drama in particular has been my nut to crack while working in podcasts as a story producer. I’ve had writers of all backgrounds ask: how can I break into audio? Before working as an editor, I grew up as a drama club nerd and got a degree in performance studies. Yet writing for audio drama was still an evolving medium. Figuring out how to translate from the page to the .wav (or .mp3) has made me rethink a lot about the norms I had worked within the book space.

Audio drama is a form of theater, but not exactly the same as interpreting work onstage. Reading a play on-air or recording a stage performance is a type of audio fiction, but the outcome isn’t the same as drama written specifically for the medium. Most stage performances are written from a visual perspective. Likewise, audio drama also isn’t 100% the same as an audiobook or reading a story aloud. Prose stories are written with a perspective in mind: getting into the “shoes of a character.” Perhaps, the best way to figure out what makes a good audio drama is drawing the connections between the ear (sound, narrative’s external focus), the eye (visuals, clarity of setting and characters), and the arm (actions, clarity of scene movement).


First, let’s talk about what makes audio fiction different from audio drama. They both use sound to tell a story after all. Are we just nitpicking? Not really, because the roots of audiobooks are different from audio drama, which has impacted the way each format’s norms have evolved.

Audio fiction is a text that has always been intended to be read: short fiction, novels on tape, newspaper articles, magazine features, to name a few types. The first audiobooks were created for greater accessibility for blind people and were usually funded by grants or created specifically by publishers for the library and educational market. Then, starting in the 2000s, audiobook listening opened up in the market. Several reasons are behind this: commuter culture; the cheaper availability of audiobooks; the creation of iTunes and ripping audiobooks into .wavs and .mp3s for easier portability; the tradition of radio drama in the UK spilling over into other markets, as well as the popularity of certain audiobook narrators like George Guidall, Scott Brick, Jim Dale, Cassandra Campbell, Julia Whelan, and others.

Then, multicast audiobooks started happening. The 2006 audiobook of World War Z is a fantastic example, featuring a cast of over 30 different voices narrating the zombie apocalypse. Comics also started being adapted into audio drama, with Locke & Key being one of the first; the growth in comics as audio also led to the recent adaptation of the Sandman series.

Audio fiction as text read aloud emphasizes the interior experience of characters. While good prose “paints a mental picture,” exposition is a key tool. You can hear a character’s thoughts for a first person POV or an omniscient narrator. Magic systems can be broken down page by page. Prose expects the reader to see behind another character’s eyes and be taken on a journey through that experience.

On the other hand, audio drama has roots from the old school radio plays of the 1930s and 1940s. (Whenever I explain to anyone what an audio show is, I always say: “It’s like a modern version of old-school radio drama!” and then they respond, “Ah yes, like what people during the Depression/WWII listened to!” LOL).

The DNA of current audio drama has links to radio drama and podcasting, certainly. But I’ll also argue that modern radio drama also has a lot to do with interactive theater. Mainly because a listener’s expectations with audio drama are different from a reader’s and a book.

Interactive theater is when the audience member is asked to participate with the actors. Improv shows, for instance, have actors who solicit audience answers to guide their shows. Haunted houses are another, where the audience is led through the space and are expected to be scared by the cast. Shows like Sleep No More is a more sophisticated version, where audience members move through an open performance space, can touch all of the sets and props, and can even be pulled aside for a one-on-one performance with an actor. In the same way, an audio drama is asking the listener to be that audience; they are asking the listener to contribute their imagination in a more personalized, intimate way to explore the confines of the story. While a reader may sit inside another character’s shoes, an audio drama listener takes their own shoes and marks up the scenery with their footprints.

But how does an audio listener know where to tread? Renowned radio legend Norman Corwin once said, “Clarity and being able to convey meaning, emotion, attitude, through understandable language is a sine qua non of radio.… Comprehensibility is a must.” Likewise, audio dramatist Roger Gregg agrees that audio drama expectations create different associations using sound: “Storytelling using sound is different from audiobook-style fiction. It is more akin to visual performance.”

Clarity in an audio drama sets up expectations based on exterior trappings of storytelling. When thinking about the difference between audiobook and audio drama, a common assumption is that the exposition parts are given to the narrator, and the dialogue parts are given to actors. But audio drama also includes sound design to increase the depth of the world building. Some sound design might come across as minimalist or “one note,” but sound design establishes the acting and dramatic subtext as much as actual words do. Sound can help fill in how something happens in a more visceral, immediate way than simple prose. It can flesh out how a person walks, talks, moves, gestures. Most importantly, the sound itself embodies the world, a world open to personal interpretation to fill in the gaps.

An audio show focuses the listener’s attention to certain spots, lines, intonations, and action sequences. Focus should NOT be confused with “passive.” A listener is not letting the blasts of SFX wash over them. An audio drama asks for another level of engagement to build the story alongside the writer’s hand. For example, a setting might be described evocatively as, “in a summertime park, next to a bubbling brook,” in a script, but that SFX alone can create different images of brooks in a listener’s mind. Do these conflicting images negatively impact the story? Not necessarily, as long as the show makes it clear why that stream impacts the setting and the plot. Having a Google search result equivalent of creeks, streams, and brooks would not automatically ruin the intention behind the scene.


Making the visual into audio comes down to what is emphasized in the script—that is one of the major differences between a novel and an audio drama. Audio drama depends on the listener’s imagination to pull in individualized visuals, but that audio drama script is also written for a creative team: the director, the actors, the sound designer and sound engineer, among others.

Enough details about the setting should be included to let everyone be on “the same page” so to speak, while also leaving enough room to scribble additions in the margins. While not every scripted detail will actually appear in the final audio cut, any actor reading that set-up will understand the environment their character is entering. Likewise, the people sound designing the scene and making musical choices should know what they are getting into upfront. When I adapted the novella Marigold Breach into an audio drama, for example, the character backstories between our leads Lucan and Ven are explained in the stage directions on the very first page. The reveal doesn’t unfold in the story until Episode 7, but the actors and the production team are responsible for creating the surprise for the listener; they are not supposed to be surprised halfway through the table read.

While the script can be technical and detailed, the end results still allow for the listener’s “eye” to fill in the blanks that the audio doesn’t necessarily emphasize. Many audio dramas do not physically describe their characters, for instance. A listener may infer physical details from what a character says about themselves (i.e. TED: “I’m wearing my lucky Halloween socks!”), or how other characters describe them (i.e. “Nice skeleton socks, Ted.”) or how others treat them (i.e. “DAMMIT, TED, THOSE ARE MY JACK SKELLINGTON SOCKS!”) What details are kept or emphasized creates a very different picture and points of dramatic tension!

There are plenty of ways to help make the audio visual. What sonic tools can be considered when building this “movie in your mind?” Kc Wayland calls this “Sound Point of View” in his book Bombs Always Beep: Creating Modern Audio Theater. Here are some writing tools to help guide the audience when filling in the blanks in their imaginations

Vocals: Acting choices is key, but so is something as subtle as breathing. Accents are an obvious indicator in audio performance too. Moreover, the voice is the character. Distinct voices are crucial; voices that sound too alike can easily confuse a listener. But every other physical aspect could be up to the listener’s imagination! On a recent panel, an interesting example I heard a dramatist mention is how listeners tend to mentally cast a character as someone they know in real life who resembles that character!

Sound effects (SFX): These can come from a professional sound library, stock, or pre-recorded. Foley is customized sounds, created by professional designers fiddling with a bunch of cool trinkets and random do-hickeys to replicate sounds.

Music: A soundtrack is especially helpful when explaining the emotional backdrop of a scene, or the genre setting. Writers aren’t expected to ideate on the music, but even evocative descriptions of settings (ex. “A greasy spoon diner in an underwater colony”) or a historical period (ex. “The wind-blown steppes of what will be Mongolia”) can help foster imaginative compositions for the composer and sound designer.

Silence: Pauses, breaks, and sudden drops in sound design are tools a writer might not think to build into a script, but can be the most powerful.


In theater, film, and visual performance, the eye follows motion. How a character gestures, runs, eats, stims, enters or exits a room—everything can draw the eye. In theater rehearsal, recording a character’s movements across a stage is referred to as “blocking.” In prose, I’ve also advised fiction writers to be aware of how their characters are “blocked” in a scene, to make sure they are not simply floating voices in a space. In audio, that rule counts for twice as much!

In prose, exposition and scene setting can physically describe as much detail as you would like. In audio drama, the same amount of description can be distilled through sound design. More than that, however, prose uses white space, chapter headings, and those little “scene break asterisks” to show scene changes and help break down the pacing. In audio, the dramatist should be mindful of how scene transitions are made, and how to pace a story so the ear can follow the plot.

Often, writers new to audio tend to “overwrite” in ways that make the narrative flow too dense to follow. Some common mistakes by new audio writers:

Excessive description: Different from “purple prose” in novels, excessive description is a dense amount of exposition or internal dialogue that can halt a scene, or make the listener lose the thread of the narration.

Dialogue for the sake of dialogue: Too much banter or seemingly aimless conversation could drag down a scene, especially if the listener doesn’t understand why the conversation is important enough to pay attention to.

Too many scene changes: This is particularly a common mistake made by screenwriters transitioning into audio. A lot of quick switches between POVs or physical places, even if each scene has a specific SFX or sound design, may be confusing to follow. The only exception to that would be comedy—but its success, of course, is all in the timing. 🙂

Creating effective action sequences and pacing is also unique to audio. The SFX linked to physical combat and action (guns, explosions, or punches and kicks landing) can become monotonous and repetitive. But to narrate all the physical action, ironically, also slows down the pacing!

So what can be a good remedy? How about introducing some dialogue instead?

A fantastic example of dialogue that enhances an action sequence can be found in the show Shipworm. In the first act, Wallace, a former army doctor, carries an injured coworker as they escape a burning building. While the sounds of alarms and the low roar of fire can be exciting, what grabs the listener’s attention is a very personal war story Wallace tells his companion to keep him conscious as they wade through the smoke and flames. The listener can’t tell where Wallace is going or how they are running through the building, but the physical blocking doesn’t matter. Wallace’s words anchor his friend’s awareness—and the listener’s ear—building tension and suspense until they finally exit the building as Wallace’s monologue wraps.

Audio fiction, audio drama and everything in between—while this is a general primer of what can make or break a good show, standards are changing all the time. The audio storytelling space is filled with a variety of voices and techniques. As more creators—writers, sound designers, musicians, actors, and others—get into audio drama, I’m sure the theories from various artistic backgrounds will continue to challenge the boundaries of aural tales. What I suspect will remain constant, however, is the expectation that an audience isn’t passive. A good story—whether read, heard, or performed—is never simply a “one note” experience.


Diana M. Pho

Diana M. Pho is a queer Vietnamese-American independent scholar, playwright, and Hugo Award-winning fiction editor. She has over a decade of experience in book publishing, including the Tor Publishing Group and the Science Fiction Book Club. Diana has also most recently been part of as their Lead Creative Executive for Co-Productions & Partnerships developing thrilling and innovative audio dramas. Currently, Diana serves as Executive Editor for Erewhon Books. Additionally, she has a double Bachelor’s degree in English and Russian Literature from Mount Holyoke College and a Master’s in Performance Studies from New York University.

Photo Credit: Gerry O'Brien

Leave a Reply

You must be logged in to post a comment. You can register here.