Episode 266: Jessica Powell: How Sound Separation is Shaping the Future of Music

LISTEN TO THE EPISODE:

Scroll down for resources and transcript:

Jessica Powell is the CEO and co-founder of AudioShake, a company pioneering AI-driven sound separation technology. Formerly Google’s VP of Communications, she has a deep background in both tech and music, helping artists unlock new creative possibilities.

In this episode, Michael Walker and Jessica explore how sound separation, AI, and immersive content are transforming the music industry.

Key Takeaways:

How Audio Shake’s sound separation technology gives artists new creative freedom.
Why immersive content is the future and how musicians can prepare.
The ethical and artistic impact of AI in music production.

free resources:

Tune into the live podcast & join the ModernMusician community

Apply for a free Artist Breakthrough Session with our team

Learn more about Jessica Powell and AudioShake at:

AudioShake.ai

Transcript:

Michael Walker: YEAAH! All right. Excited to be here today with my new friend, Jessica Powell. Jessica is the CEO and co-founder of AudioShake. It's a company pioneering groundbreaking audio separation technology to revolutionize music, AR, VR, and immersive sound experiences. She's a former Google executive and published author, helping shape tech innovation and storytelling, with her work featured in The New York Times and Time Magazine.

So, I'm really excited to have her on the podcast today to talk about the future of music, where things are headed, especially as it relates to cutting-edge sound tools and technology that allow us to do better sound separation, lyric transcription, and spatial computing—like the VR world. I feel like, at the time of recording this, we're just starting to tap into it. So, I'm looking forward to connecting and hearing her take on where things are headed.

Jessica, thank you so much for being on the podcast today.

Jessica Powell: Thanks for having me.

Michael: Absolutely. So, to kick things off, for anyone connecting with you for the first time, could you share a little more about your background and how you found yourself working with VR and music?

Jessica: Oh, sure. Let's see. I've always played music. I mean, music's always been such a huge part of my life. I didn't go that route after college—I was really freaked out about not having a job and just went for whatever I could get that seemed like a more traditional path.

I worked so many random jobs. I was a journalist at one point. Another time, I was chasing wild boars in Portugal. That was a very, very random path.

Michael: Wow. That is so specific. We could have a whole podcast episode just unraveling that right there.

Jessica: An entire podcast episode on all my bad career choices. But eventually—or rather, relatively early on—I ended up in tech and spent my whole career there, mostly working for Google, though I worked at a few other places as well.

When I was leaving Google, my co-founder and I had already been experimenting a lot and thinking about different ways we wished we could create with music. We had both lived in Japan years earlier and done a ton of karaoke. We always thought, Why can't you karaoke to old punk, old hip-hop, and songs that just don’t end up in the karaoke catalog?

When you do sing, you end up doing a lot of Van Morrison and Oasis.

Michael: Hmm.

Jessica: And they’re covers too, right? I remember one night, I really wanted to do this old Gang of Four track. Of course, it wasn’t in the karaoke catalog. We were like, What if you could just rip the vocals from everything and sing along? Wouldn't that be really fun?

It was just one of a million what if ideas we had. It wasn’t like we were going to go build a karaoke company—of course not. But years later, we were in Silicon Valley, working at our respective companies, still playing music, and we came back to this idea.

We thought, What if you could split sound apart? Sure, karaoke—but what else could you do with sound if you could separate it? Because most sound doesn’t come to us in a perfectly curated, artistic way where everything’s been tracked out and edited.

We started to think about large-scale applications but also how we could make sound easier for people in different daily life experiences or workflows if we could separate it. That’s how we got started.

Michael: That's really cool. So, by separating sounds, you mean that if there’s a song with all the different instruments, you could separate them into different stems?

Jessica: Yeah. If we're thinking about music workflows, today, if someone goes into the studio to record, they’re probably laying down the tracks separately, and then the producer is bouncing the stems at the end—hopefully passing them on to the musician, though that doesn’t always happen.

With older audio, that's not the case. If you have a track from the 1930s, 1950s, or even the 1980s, the likelihood that stems exist is low. In some cases, it was impossible because it was never multi-tracked. Multi-tracking didn’t start happening until the time of The Beatles.

Then, okay, you’ve got everything on different tracks, but we’re also talking about analog tapes. Those tapes end up in warehouses. Some of those warehouses burned. Some of those tapes can’t be found. But let's say you do find the tape—it’s probably $10,000 to $20,000 to bake the tapes, and you’ve got one shot to try and extract everything. Meanwhile, the audio on those tapes has degraded. The vocals might have bled into the drums or something similar.

For older audio, trying to create stems just isn’t an option. That means opportunities like sync licensing or Dolby Atmos immersive mixing—formats or monetization avenues—don’t exist for those artists or their estates.

That actually affects a huge range of audio. You’d be surprised by how much it covers. You’d think that once everything moved to digital workstations, stems would always be created. But even then, you run into new problems.

One issue is that people didn’t see the value in stems, so they didn’t hold onto them. Another issue is that, sure, maybe stems were created, but they were made in a specific version of a DAW with particular plugins, and suddenly, 10 years later, someone tries to open those files, and they can’t even access them.

So, the need for stems goes way back. But even in contemporary recordings, stems still aren’t always created or passed on to the artist, the label, or whoever else might need them.

Michael: Makes sense. Yeah. So it sounds like what you're saying is that, yeah, in a lot of cases, these stems either weren’t provided to artists, or in some cases, they're from back in the day when they didn't necessarily have easy access to the different tracks. And so, in that world, it's a lot more difficult for music supervisors or people who are trying to use those to mix different tracks to go on soundtracks.

So it was a big body of work that was basically unleveraged because there was no way to separate it into tracks.

Jessica: Yeah. And there's also other cases too, right? At the end of the day, a live performance today is not so—I mean, it's different in a lot of ways, but it's not so different from, say, in the 1920s.

The problem with, say, trying to separate something from Tin Pan Alley recordings or the 1930s and beyond is that you basically have a mono track file—you have no parts to it. But you can end up with the same thing today, roughly, with a live recording too.

Which is like, maybe you've got everyone mic’d up and you're recording it, but you have a huge amount of bleed into those different tracks. So you're still going to need to find a way to separate, to remove the bleed from the bass into the vocals, or something like that.

Michael: Super interesting. Yeah. So for live recording as well, when you just have one recording source and it's getting all the instruments, then you don't really have a multi-tracked way to do it.

And multi-tracking takes a lot of time, energy, and effort to fully track up all these different instruments.

Jessica: And then you just have to deal with the general problems of sound, right? You can have someone mic’d up, but that doesn't mean that mic’s not catching everything else in the room.

Michael: Totally. Yeah. So, like, bleed from other instruments. And another use case I can imagine is—when I was first starting to dabble with music production, one of the most helpful things to do was trying to recreate songs that I was listening to.

It would be a lot easier to recreate them if you could actually break apart all the stems and just listen to the different parts and understand what’s doing what, where.

Cool. So I'd love to hear a little bit more about the tool that you've built and the journey of creating something that can actually help solve that problem.

I feel like I've heard of a few different tools, and especially right now, in the age of AI, it seems like there's a whole suite of different things coming out. I'm curious to hear your journey with that and what are some of the cool features we're able to do now that we weren’t able to do 10 or 20 years ago.

Jessica: Yeah. I mean, we were very interested in sound separation versus sound creation.

There's a lot of stuff out there now—generative music and everything—and it's all very cool and fun to play with. It’ll get better and so forth.

But for us, being musicians and sort of sound-obsessed—and by sound-obsessed, I mean, sure, playing instruments, but also just thinking a lot about sound.

You think about how sound actually gets in your way in a lot of things day to day that we just take for granted. We don't even think about it.

Think about going into a noisy bar and not hearing the person next to you, or recording a video on the street when a siren comes by. Sound can get in the way of us being able to derive meaning from the world as well as being able to create with it.

That really is the space that we focused on.

We're not doing a thousand different music tools and everything. We're very, very interested in—and we think there’s still a long way to go—in making it easier for sound to work better for everyone in a number of different contexts.

What we built when we first started was, we wanted to work with artists and rights holders right from the start because we thought the technology could also be a bit disruptive.

I fully respect that if you're Van Gogh and you painted all these sunflowers, you may not want someone to pull out a single sunflower. And I think while I’m a huge fan of remix culture and would never want to live in a world where J Dilla didn’t exist—or DJ Shadow and Public Enemy and the Bomb Squad and all of that—at the same time, that can coexist with also just wanting to be a little bit more cautious about how we put the technology out.

We wanted to make sure it’s actually something that other musicians—not just ourselves—would find useful, and that they could use to make money, explore new creative opportunities, or whatever it might be.

So what we built initially was a tool for labels, publishers, and managers, allowing them to very simply upload a track, pick what kind of splits they want, and then separate.

We also built a technology—a platform—that indie artists could use, which is called AudioShake Indie. It works the same way, allowing artists to upload a track and access our technology.

Currently, on the tool for labels, film studios, and some of the enterprises we work with, there are a few other tools as well.

For example, the ability to split—this is more in the film space—but split dialogue, effects, and music, or transcribe lyrics.

We'll make some of these available on the indie side at some point. But right now, indie artists can access stem separation at AudioShake Indie.

Michael: I mean, even that, like the wider vision that you shared around just how sound makes such a big impact on our day-to-day life and how you're walking through a busy coffee shop and you can't hear anything, or whatever it might be... When I was touring full time out of high school for about 10 years, I dreamed of a magic pair of headphones that, when I was in the back of the van, blasting music and trying to sleep, I thought, “Oh my gosh, it would be so nice if I could just control my surroundings with an app and have more of an augmented, you know, reality sphere around audio.”

Is that a part of what you’ve built, with that vision in mind, allowing people to be more interactive with how sound is shaped in their environment?

Jessica: Yeah, that's definitely something that we're very interested in. I mean, part of what we have is a service for content owners and artists, but we also have a whole business around our APIs and SDKs, where we partner with hardware manufacturers or different sites and applications so that they can use our technology. So, someone building hardware could use it for a number of different purposes, right? It could be for a sound bar, a voice application, or, like you're saying, ear buds or something like that. They would be able to integrate the technology for those kinds of purposes.

Michael: Cool. Okay. With it in mind that, you know, a lot of the folks listening or watching this right now are independent artists who are looking to pursue their own music and are out there on the lookout for new tools and things that could improve their workflow, I’m curious about your perspective on both audio separation, STEM separation, and how any artist can leverage those to benefit their careers. But also, I’m curious about maybe some more emerging technologies as well. I know you’re also dabbling in VR, 3D, and spatial computing. So, I’m curious where you think things are headed for artists as it relates to those kinds of emerging technologies.

Jessica: Yeah, I mean, I think that if we just look at something like mixed reality or immersive content, I think we—the world of, whether you want to say VR or mixed reality—there are different degrees of immersiveness. Whether you feel like you’re in a completely different world or it’s something more immersive, but you’re still aware that you’re in the normal world. Regardless of which kind of path you look at, everything has moved a bit in fits and starts. We were like, “Oh, VR is here, it’s going to be huge!” and then it didn’t really happen. You don’t get the adoption you see. Then Vision Pro is here, it’s going to be huge, and again, you don’t entirely see the adoption. But I do think if you take a step back and look at the advances in compute as well as user experience, every step is forward, and we will get to a point where so much more content is immersive. And in most, like in everything from the most boring business context to the creative context.

I think we’ll get to a point where most content can easily be made automatically immersive. So, we’ll easily get to a point where most content will be made immersive and largely on the fly, meaning that the visual input will be spatialized. And that’s an area we don’t work on—like computer vision—but there are some very cool things happening there in terms of taking visual input and arranging that in space. Similarly, on the sound side, you’ll have the sound input made immersive. And I think there will be lots of, just like there are today, a mix of experiences. Just like on YouTube, you have really fancy content made by film studios or creative agencies, but you also have a kid on a skateboard doing a trick. I think we’ll see that same coexistence happening in immersive content, where you’ll have really fancy, immersive videos that are shot in a way to immediately become immersive. And similarly, the sound will be recorded for Atmos or whatever the format will be from the start. But I think you’ll also have just a bunch of stuff that’s shot like the kid on a skateboard, and then there will be a slightly less good but still impressive version of all that immersive content.

So, if I were an artist, I don’t think there’s anything you have to do right now, but I think that’s coming. And I think it’s an interesting, new way for—and more immersive way—for fans to connect with artists. It’s funny, for all of the talk around immersive formats and Sony 360, Dolby Atmos, and so forth, I think for a lot of people, they’re listening on their headphones. They’re not really having a full surround sound or spatial audio experience. I think that changes quite a bit once people start wearing devices that were made for that from the start, where you also have visual input, right? It’s cool to hear an artist in an immersive mix, but I think what’s really compelling is when it’s married with the visual of that artist playing or the video that the artist has created. That really adds to the level of immersiveness. So, I think that’s definitely something that will come along.

Michael: That's so cool. As you were describing that, I was imagining, and I guess before I get to that, just to reiterate what you're saying, it sounds like what we're working with right now is, in many ways, not fully leveraged or utilized yet because not everyone has devices or interacts with them regularly. These kinds of mediums, the technology, if you zoom out a bit and just look at the progress, has been moving forward pretty rapidly. If you assume any sort of rate of continued improvement, it's going to be here. We’re going to have these types of digital environments that allow us to fully utilize the new technology and experience things more immersively.

As you're sharing that, there's a visual/audio image that popped up of a music video where, you know, there’s music happening around you, but then someone intentionally places a fun little hook or part that catches you off guard right behind you. It’s like buh duh bah ow pfff, just before a drop, and then dancing visuals around you. It made me really excited. It seems like The Beatles were a great pioneer of the technology at the time, which was stereo. It’s like they had to push the edge. I wonder who’s going to be the artist that really pushes the edge with this emerging technology and is able to do fun things like that.

To your point around the visuals as well, it does seem like there’s all this emerging technology converging, even with generative AI, and some of the things they’re able to create with Blender and other simulated worlds. I’m curious, if you had to guess—knowing that historically we’re pretty bad at predicting exponentials—what kind of timeline do you think it will take for this type of digital experience to become as popular as using our cell phones today?

Jessica: I have no idea because so much of that is hardware-dependent. And then I think it's also just a question of when we’re saying, “Where is it ubiquitous, and for whom?” You have existing constraints around both how you ingest this content, the amount of manual work that currently happens today to make content immersive. Some of that work won’t—and shouldn’t—disappear. I think there should always be a world where an artist and their mixer decide on a very artistic, manual level how they want a song arranged, essentially. But there will also be things done automatically. There will be automatic immersive mixing. That technology already exists, but it will become more commonplace.

Anyway, there are constraints today just in terms of preparing content, which means there isn’t that much content available necessarily. There are constraints on the hardware side in terms of the quality of that experience. All those things still have to advance. Then there's a separate part, which is about the availability of these devices for us to experience it. Price is still a massive blocker. I don’t even remember how much the Vision Pro was, but it was surely over a thousand dollars, I think.

Michael: Yeah, the Vision Pro is sort of an outlier too, right? It's clearly for people who have enough to just dump into this crazy expensive but very cool gadget. I just finished reading a book called The Singularity is Near by Ray Kurzweil. I think he actually has ties to Google, or works at Google. I don’t know if you’re familiar with him, but he's a very interesting technological thinker and futurist. He’s predicted the internet, AI, and different things before they happened. The thing he talks a lot about is this concept called the singularity, where, because of the exponential increase in computing and price-performance of compute, we’re kind of headed toward an asymptote where technology starts changing so quickly that it’s almost impossible to predict what’s going to happen because the rate of change is so fast.

I know he talks a lot about emerging technologies like this. I wonder what you would predict in terms of widespread adoption. And to your point too, price performance is an important thing for hardware, not just for the software part of it.

Yeah, super interesting. So, for someone who's watching this right now, maybe they’re excited about this as a topic of conversation and realize that going into this, they’ll be an early adopter. It’s not like there’s going to be a huge amount of existing foundation. As one of the first adopters, how would you recommend that an artist start exploring some of these different technologies and tools so that they can catch the momentum of the wave as it hits, rather than being struck by the tsunami when it first crests?

Jessica: Well, on some level, this is definitely not my area of expertise. We work with companies building in mixed reality, VR, and so forth. But I guess I would say, I still think there are so many things. First of all, I think it's so hard to be a working musician. There's so much that you feel like you have to keep track of, right? Just the social media aspect alone is so exhausting to even say. It feels like, you know, you say it, and you already just sort of want to hang up the towel. Part of me would just say, you know what, wait a little bit because there are still so many things that have to come together for this to be accessible for people to play with.

On the other hand, if there's natural interest in exploring this further, I think even just starting to play around with the lenses on Snap, for example, and thinking about what you could do with music in relation to your fans—applying different effects or interacting with your music in different ways.

And, of course, there's always the route around spatial mixing itself. There are mixers that work with the Dolby Atmos format. There are also services that will do it algorithmically. One of the companies we work with is called Master Channel. They offer spatial mixing. There are, of course, ways to experience immersive music today through either a manual or algorithmic experience.

But I think just playing around with existing mixed reality apps, or even Pokémon Go (Niantic Labs), is a fun way to start imagining how your own art could exist in those spaces. I have a friend who has an app called UR, which is a fitness app. It allows you to work out with your music, and the effects and the workouts change while you do this all in VR. So, you're having an experience. Maybe you're punching the air or something like that, but it's also in time with the music. That’s a fun one to look at, showing how music can be part of a very gamified fitness experience.

And then there’s Tribe XR, a popular VR platform with aspiring DJs. Again, it’s interesting to see how music is being reimagined for different contexts.

Michael: Cool. You said the fitness app was called UR.

Jessica: Why U R.

Michael: Why U R. Awesome. So, it sounds like what you're saying is that, you know, it is good to be aware that this is emerging technology, so maybe don’t go all in and give up all the other stuff that's foundational to what you're doing. But, you know, if you use your imagination, just start exploring some of these different platforms that exist, play around, have fun, and use it to reimagine your artwork. Start thinking about what your music, what your artwork, would look like in these different spaces. That’s a good way to get started, so you kind of have a head start before it becomes more mainstream.

Jessica: Yeah. I mean, I sort of think that way about a lot of the tools that are out there right now. You can also think about automatic video generation and what you could do with music to actually produce a video using some of the generative tech. That, of course, brings up a whole debate around generative and everything because the debate is the same on the visual art side as it is on the music side. But as a tool to experiment with, it’s pretty neat to see all these generative image and video-making platforms and how fast they're evolving.

Michael: Mm. Yeah, it seems like it would be good for everyone to acknowledge that the technology itself probably isn’t going anywhere, and it’s only going to get more and more sophisticated. As a tool, it’s worth learning. At the same time, we want to make sure we’re doing it in an ethical way as much as possible. And we're still figuring it out. But it reminds me of, in the music industry especially, when music started—like when digital downloads became a thing and there was pirating and torrenting of music downloads and just sharing music on the internet with this emerging technology.

Unfortunately, most of the rights holders, instead of going along with this technology and investing in learning how to use it ethically, they fought against it, resisted it, and wanted to sue it. Napster and things like that were clearly problematic. But, yeah, the technology became a lot more prevalent. So, it does seem like it's worth, regardless of where you fall in the equation, becoming more familiar with it and looking for transparent conversation to figure out how we use it in an ethical way.

Jessica: Yeah, I mean, I think it also depends on what the tool is. AI is a very vast space. It's in your bank, your credit card company, your Uber ride. In a lot of ways, it's easier to think about AI as sophisticated statistics or even just software, and then ask the follow-up questions around, "Okay, how is this software built? What is the input? What is the output? Are you comfortable with both?" Like, what was the input to train it? And then similarly, what is the output? There's so much of AI that really doesn't present a problem for society, and we've been using it for a very long time.

So, for example, in stem separation, we don't really get pulled into debates around training data and so forth. Not because we're good people, though I'd like to think we are, but because we don’t actually need the kind of data that a generative music system might need.

For us to separate a song, we're just separating what's already there. We're not adding anything in or reimagining; we're literally pulling the song apart. To train a system like that, you don't need to train it on, say, all the music that's on Spotify. It doesn't get you anywhere. What you need to do is go out and get stems of lots of songs. You can't just pull that from the web. So you go to production library music and license those stems.

Then your output, again, isn't threatening musicians. You're helping them. It's a tool that's pulling the song apart. So I think breaking down any AI technology into these questions of, "What was the system built on? What is it trying to do, and what's its output?" is relevant. How does that output make you feel as an artist? I think those are all relevant questions to ask.

I do feel torn on a lot of questions around generative systems. As someone who is a terrible drawer, I can't draw or imagine well, I find it incredible to be able to play with something like DALL·E or any of the image systems and see something imagined for me. It’s something I had in my head that I could never put down on paper. These are casual uses. I’m not running an ad agency; I’m just playing around. It’s really inspiring to see that.

I even tried it with fiction writing. I was stuck on a scene in my current novel where I needed to describe a Ferris wheel. I looked at images on Google, but I was getting stuck on the feel of a Ferris wheel. It wasn’t even that important to the scene, but I just couldn’t move forward. I decided to try a tool like ChatGPT and pasted in a prompt, asking it to describe a Ferris wheel. The response was terrible—soulless, with no personality or style—but it got me thinking about the gondolas on the Ferris wheel. That opened a door for me to think about the swaying object in the sky. I didn’t use any of the output, but that prompt led me in a new direction. I think that's an example of how generative tools can be inspiring, and it’s a concept I can apply to music too.

So, how do these tools for creation make artists feel about being treated fairly? And how do we not throw out these tools and think they shouldn’t exist, just because they can be imperfect? You can't throw it out—it exists. And there is something really inspiring about having a tool present something to you from a different angle. It opens the door to new perspectives. I think, at least for me, that’s how I’ve used it.

Michael: Yeah, I totally relate with what you're saying. I use AI a lot, and I always have a ChatGPT window open. It’s a great assistant and tool. Even when recording, I’ll often record video or audio messages, and I’ll take a screenshot and use the transcription to turn it into something nicely formatted. Like you said, it’s also great for brainstorming. It might not be perfect, but it can take you from zero to 90 percent in 10 seconds, saving me five to 10 minutes.

But Jessica, it’s been great connecting with you today. This is one of my favorite topics to geek out about. I'm so grateful to get to interview interesting people like you and talk about where things are at and where they could go. For anyone watching or listening, who’s interested in learning more about AudioShake or using these tools, where’s the best place for them to get started?

Jessica: Sure. We're at audioshake.com, but you can also go to Audioshake Indie, which is our tool for indie artists to create stems. And we're on all the socials because, just like musicians, it never stops when you’re a startup. It's constant content creation.

Michael: Yep. Just a side note on social media: we're developing an app right now called StreetTeam. It's in private beta, but we're going to be launching it publicly soon. One of the reasons we built it was to solve that specific challenge around social media overwhelm. There are all these platforms. The idea is to have one source of truth as a home base where you own the data, like the email list and contact info, but you can also distribute your posts to all the different platforms and have everything aggregated in one place.

Jessica: Yeah, that's great. I remember there was something like that back in the day, aggregating Twitter and a bunch of other platforms, and I think it disappeared. It's super useful. That sounds great!

Michael: Awesome. Hopefully, we can help solve that need for artists and creators. We've been using it for ourselves, and it’s been really cool. Jessica, it was great connecting today. As always, I’ll put all the links in the show notes for easy access. I appreciate you taking the time to be here on the podcast.

Jessica: Yeah, thanks for having me.

Michael: YEAAAH!