The Music Understanding framework [video]

gok1 pts0 comments

Meet the Music Understanding framework - WWDC26 - Videos - Apple Developer

View in English

-->

More Videos

About

Summary

Transcript

Code

Meet the Music Understanding framework

Discover Music Understanding, a new framework that lets your app analyze audio across six dimensions, on device: key, rhythm, structure, pace, instrument activity, and loudness. And use the Music Understanding Lab sample app to visualize each result.

Chapters

0:00 - Introduction

1:39 - Musical features

3:19 - Framework integration

3:55 - Music Understanding Lab

Resources

Creating visuals with Music Understanding analysis results

MusicUnderstanding

HD Video

SD Video

Search this video…

Hi. I'm Conner from the Computational Music Team. And I'm excited to introduce you to a framework called Music Understanding. It gives you access to on-device musical intelligence across all Apple platforms. It handles all the signal processing and model inference for you so you don't need any expertise in signal processing or machine learning to use it. And because it runs entirely on-device, the audio you analyze stays private and works offline.<br>At Apple, The Final Cut Pro team used the Music Understanding framework to power two features of their app.<br>In the beat detection feature, Final Cut Pro analyzes a song for its rhythm and structure to reveal its beat grid.<br>This helps editors visualize and align their edits to song parts, bars, and beats.<br>And in Final Cut Pro for iPad the montage feature analyzes for rhythm, pace, and structure to automatically synchronize clips to the music.<br>I'll start by going over what the framework can do. Then, I'll follow that up by explaining how you can use the framework. Finally, I'll go through the API and show how it was used to build a sample app for understanding music.<br>The framework provides analysis around six main areas: key, rhythm, structure, pace, instrument activity, and loudness.<br>Rhythm is the pulse of a song, driven by individual beats. These beats build into bars.<br>The number of beats in one minute is called beats per minute or bpm.<br>Bars form phrases, which you can think of as musical sentences.<br>Phrases combine into segments, creating a more complete musical statement...<br>and those segments ultimately build the sections. You can think of a section as a chorus, verse, intro or bridge.<br>During a song, instruments such as a drum, bass, or vocals may be playing at different times and at different intensities. These instruments play around a common set of notes called the key.<br>While the song may have a consistent pulse or bpm, different parts of the song may feel slower or faster. This is called pace.<br>Over time the song may sound louder at some points than others.<br>These are the building blocks of the Music Understanding framework, and by integrating it in your app, you unlock a whole new level of possibilities. Next, I'll talk about how to use the framework. At a high level, apps interact with a MusicUnderstandingSession, initializing with either an AVAsset or a custom audio provider. To start analysis, clients call analyze and await results.<br>By default, the framework analyzes for all analysis types. For the highest performance, you can specify which analysis types you are interested in to avoid unnecessary computations.<br>To explore the framework more deeply, I'll review a sample app called Music Understanding Lab, available on developer.apple.com. Let me show you how Music Understanding Lab works. First, I'll select a song on the device.<br>The app uses the Music Understanding framework to analyze the audio, turning it into a visual experience with a dedicated tile for each result. When I hit play, notice, the Rhythm and Structure tiles update as the song plays. The playhead ties the experience together, letting you follow along with the music.<br>I'll start by talking about how the Select Song... button is implemented.<br>Using the SwiftUI fileImporter, I'll select a file to get its URL.<br>Then I'll use that URL to create an AVURLAsset. Be sure to set PreferPreciseDurationAndTimingKey to true to ensure the most accurate results. Next, I'll create the session from the asset and call analyze and await the return of the session results. Inside the SessionResult struct, every feature Music Understanding analyzes gets its own results field. These are all optionals. When you use the general analyze() API, all results will be available. However, if you use the targeted analyze(for:) API, the framework will only return the results you asked for, and the rest will be nil.<br>Throughout the Music Understanding framework, there are two standard types used to associate time with a value. A TimedValue associates a value with a CMTime. Similar to TimedValue, a RangedValue associates a CMTimeRange with a value. With these time-based types in mind, I'll discuss the features Music Understanding analyzes by showing how they are used in the Music Understanding Lab UI. First I'm going to start with the Key tile. In this song...

music understanding framework song analyze results

Related Articles