Suhaas Chitturi — Blog

Bonk! — Why Every XML Export Is Slowly Killing Your DJ Library

Wed, 20 May 2026 00:00:00 GMT

Bonk! — Why Every XML Export Is Slowly Killing Your DJ Library

The export that erases your work, and why the fix requires going under the hood.

Every time you export your Rekordbox library to XML, a little bit of your work disappears. The playlists survive. The track references survive. But the ratings you've spent years curating — the hot-cue编排 you've perfected — the comments where you noted "drops at 2:47, doubles well with the Acra remix" — they don't survive. Not fully. Not reliably. This isn't a Rekordbox bug. It's an architectural constraint. The Metadata Death Spiral If you've been DJing for more than a couple years, you've probably lived through this: you spend an afternoon reorganizing your library, tagging tracks with new ratings, building out a set playlist. You export to XML. You import into a friend's Rekordbox. And then you discover half your comments are gone, your ratings are reset, your hot cues are shifted or truncated. This isn't a Rekordbox bug. It's an architectural constraint. XML is a serialization format.

It's designed to move data between systems, not preserve every nuance of a complex relational database. When Rekordbox exports to XML, it makes choices — which fields to include, how to encode nested structures, how to represent the relationships between tracks and playlists and cue points and analysis data. Those choices are lossy. They're optimized for re-importing back into Rekordbox, not for preserving your library's full richness over time. The result: every time your library passes through an XML export, it comes out slightly degraded. It's the digital equivalent of making a photocopy of a photocopy. After three or four cycles, the artifacts compound. Most DJs work around this. They don't export. They don't share libraries. They keep everything local and accept the risk that a corrupted database or a dead hard drive means losing years of curation work. Bonk!

takes a different approach. Direct Database Access, Not XML The Rekordbox file — the actual database that powers the application — is an encrypted SQLite database. It's not a secret or a hack: it's the format Pioneer DJ chose to store everything. Every rating, every cue point, every playlist, every play count, every bit of analysis data lives in there. XML exports read from this database and serialize it to a portable format. Bonk! reads directly from the same database — no serialization, no lossy translation. This means: Everything transfers — not just the fields that fit neatly into XML No round-trip degradation — you're reading the actual data, not a translated copy Preserve relationships — playlist order, track ordering within playlists, nested playlist hierarchies all intact The tradeoff is complexity. XML is a known format with good tooling.

Reading a proprietary encrypted database requires building and maintaining a bridge to handle the encryption scheme and the schema. That's what the module does in Bonk! — it speaks the native database protocol so you don't have to play telephone with your metadata. What Bonk! Actually Does Bonk! is an Electron + React desktop application that gives you direct read/write access to your Rekordbox library. Import your library — Bonk! reads directly from (both Rekordbox 6 and 7 formats supported). Your library loads in seconds, not minutes. No export required. Edit everything inline — Double-click any cell. Change the title, artist, key, BPM, rating, genre, label, comment, or any other field. Changes are written back to the database directly, no intermediate format. Detect key and BPM automatically — Bonk!

runs KeyFinder (the same algorithm used by Mixxx) for musical key detection, and Essentia for BPM analysis. Waveforms are generated locally via a Rust N-API module so album art and waveform previews load instantly even with large libraries. Write tags to audio files — Metadata edits can be pushed directly to the audio files themselves (FLAC, AIFF, MP3, WAV, M4A, OGG) using FFmpeg, not just to the Rekordbox database. This means your metadata travels with your files, not just your Rekordbox installation. Export back safely — Bonk! writes changes back to the Rekordbox database in the native format. When you open Rekordbox afterward, your edits are there exactly as you made them. Playlist management — Create, edit, and organize playlists. Import folder structures as playlists automatically. Export playlists back to Rekordbox.

The Technical Stack The app is built around a few principles that affect how it handles your data: Virtual scrolling for large libraries — TanStack Virtual renders only the visible rows, so a 50,000-track library scrolls as smoothly as a 500-track one. No pagination, no loading screens. SQLite WAL mode for concurrent access — The media cache (waveforms, album art thumbnails) uses WAL mode so reads and writes don't block each other. You can audition tracks while Bonk! is analyzing new files in the background. Rust audio engine — The native N-API module handles audio playback and waveform generation. Symphonia handles decoding, rodio handles output. This keeps the audio path off the JavaScript thread so UI stays responsive even under load. Python bridge for Rekordbox DB — The bridge ( ) vendored a compatible version of and handles the encryption handshake.

It runs as a spawned subprocess and communicates over stdin/stdout JSON — no network exposure, no external services. Zustand for state — The renderer uses Zustand stores with fine-grained subscriptions so component updates are surgical. Only the tracks that change re-render. Why Build This Instead of Using Rekordbox's Built-in Tools? Rekordbox is excellent software. The editing workflow in the app is fine for occasional tweaks. But there are scenarios where it falls short: Bulk edits — Want to fix the key notation across 200 tracks from a recent purchase? In Rekordbox, you're clicking into each track individually. In Bonk!, you can sort by key, multi-select, and apply changes in seconds. Missing metadata — Bonk! has an AutoTag workflow that searches MusicBrainz and Spotify for missing metadata.

Batch-find album art, fill in missing genre tags, correct artist names — all without leaving your library view. Cross-format consistency — If you have files from multiple sources (Tidal downloads, Qobuz purchases, Bandcamp FLACs, rekordbox-exported MP3s), they have inconsistent metadata. Bonk! can normalize across all of them in one workflow. Analysis at scale — Running key detection or BPM analysis across your whole library in Rekordbox means waiting for the app to process each file sequentially. Bonk! can run analysis in parallel, using all available CPU cores. Recovery — If your Rekordbox database gets corrupted, your XML exports are your only backup. But XML exports degrade over time (as described above). With Bonk!'s direct database access, you have a path to repair and recover that doesn't depend on a lossy serialization format.

The Bigger Vision Every DJ library is a years-long curation project. The ratings, the cue points, the playlist ordering, the notes — it's not just data, it's expertise. It represents thousands of hours of listening, sorting, and refining. That information shouldn't be held hostage by a single application's export format. Bonk! is part of a broader shift toward open, portable DJ metadata. The goal isn't to replace Rekordbox — it's to make the data in your DJ library durable and transferable across tools and over time. When you can export your library without losing your hot cues, when you can share a playlist with a collaborator without losing your ratings, when you can recover from a corrupted database without losing years of work — the library becomes more valuable, not less. That's what Bonk! is building toward.

Constraint-Aware Recommendation as Creative Scaffolding

Tue, 10 Mar 2026 00:00:00 GMT

Constraint-Aware Recommendation as Creative Scaffolding

Music recommendation systems optimize for engagement. DJs optimize for a feeling. What happens when you build a system that takes structural constraints seriously?

Spotify's recommendation engine is good at one thing: keeping you listening. It optimizes for session length, click-through rate, and platform engagement. These are legitimate optimization targets for a streaming service. They are not legitimate targets for a DJ building a set. A DJ's constraints are structural. The next track must be within a few BPM of the current one — not for algorithmic similarity, but because a real-time mix requires tempo alignment. The key must be harmonically compatible — not for acoustic similarity, but because mixing two tracks in conflicting keys creates audible dissonance on a sound system. The energy must follow a deliberate arc — not a random walk, but a shape that moves a room from one emotional state to another. These constraints aren't edge cases. They're the core logic of how DJs actually select tracks.

And mainstream recommendation systems don't model any of them. Engagement optimization vs. set optimization Spotify's Discover Weekly, Release Radar, and personalized playlists all optimize for one thing: will you keep listening? The signals are implicit — skips, saves, repeats, session length. The feedback loop is simple: more engagement means more data means better engagement predictions. DJ set selection operates under a completely different objective function. A DJ isn't trying to maximize listening time. They're trying to construct a sequence that moves a room — that builds tension, releases it, creates moments of surprise and recognition, and resolves into a satisfying whole. The signal for success isn't "did they keep listening" — it's "did the room move." This is not a small difference.

It's the difference between a system that optimizes for convenience and one that optimizes for craft.

The constraint landscape A constraint-aware recommendation system for DJs needs to model at least four structural axes: Constraint What it controls Typical range Why it matters Tempo (BPM) Beat-mixing feasibility ±3-5 BPM for smooth transition Physical constraint — can't blend two tracks at different tempos without pitch shift Key compatibility Harmonic mixing Same key ±1 semitone, or Camelot adjacent Dissonance in a live mix is immediately audible Energy contour Set arc shape Normalized RMS & spectral centroid over time The arc of a set is an energy story, not a tempo story Spectral density Perceived intensity Sparse (minimal) to dense (anthemic) Two tracks at 128 BPM can feel completely different depending on frequency content Notice what's not on the list: genre, mood, popularity, listening history.

These are the axes that Spotify and Apple Music use for recommendation. They matter for discovery, but they don't model the structural constraints that determine whether two tracks can actually be mixed together. BPM is not energy (the 128 BPM problem) The most common mistake in DJ-adjacent recommendation is treating BPM as a proxy for energy. It isn't. A minimal techno track at 128 BPM — sparse, bass and hi-hat only, enormous gaps of silence — has roughly a quarter of the perceived energy of a peak-time trance anthem at the same tempo. BPM tells you how fast the beat pulses. It doesn't tell you how full the frequency spectrum is, how dense the arrangement is, or how the track sits in the room. Within a single genre, the BPM range across an entire DJ set is typically only 10-15 BPM — roughly 13% variation.

The energy range across the same set, measured by RMS loudness and spectral density, varies by 400% or more. BPM is the guardrail. Energy is the steering. Even Spotify's own API acknowledges this. The "energy" feature in Spotify's audio analysis endpoint computes a composite score from RMS level, spectral centroid, spectral flatness, and onset rate — not from BPM. They built a feature that explicitly decouples energy from tempo for their own recommendation engine. DJs need the same decoupling, with the added constraint that the results must be mixable in real-time. What constraint-aware recommendation looks like A constraint-aware recommender doesn't replace the DJ. It scaffolds the DJ. The constraints define a space of possible next tracks, and the DJ selects within that space based on taste, intuition, and room feedback.

The system handles the structural logic so the DJ can focus on the creative logic. Concretely, this means: Filter before suggest. The system first eliminates tracks that violate structural constraints: wrong key, incompatible tempo, energy profile that would break the set arc. Only then does it rank the remaining candidates by similarity, novelty, or other preference signals. Model the arc, not the track. Instead of recommending "tracks like this one," the system recommends tracks that maintain or advance the current energy trajectory. If the set is building, suggest tracks with higher spectral density. If the set is peaking, suggest tracks with similar density but different timbral character. If the set is winding down, suggest sparser, lower-energy alternatives. Treat key and tempo as constraints, not features.

In collaborative filtering, key and BPM are features like any other — they contribute to a similarity score. In DJ recommendation, they're hard constraints. A track in the wrong key isn't "less similar" — it's unusable without pitch-shifting that degrades audio quality. Use spectral profile as the primary similarity axis. Two tracks with similar spectral centroid, spectral flatness, and dynamic range will sound more alike to a DJ than two tracks with similar BPM and key but different spectral profiles. Spectral similarity predicts mixability. Conventional vs. constraint-aware Conventional: "You liked this track. Here are more tracks like it." Constraint-aware: "You're 40 minutes into a set, currently at 126 BPM in G minor, energy trending upward. Here are tracks that maintain that trajectory, are harmonically compatible, and won't kill the room.

" Why existing systems don't do this Spotify, Apple Music, and YouTube Music don't build for DJs because DJs aren't their primary audience. Their optimization targets — session length, ad revenue, discovery — are served by engagement-maximizing recommendation, not constraint-aware recommendation. Building for DJs would mean building a different product. Even Beatport, the DJ-focused marketplace, only allows filtering by BPM and key as separate parameters. There's no "suggest tracks that mix well with this one" button. The DJ is expected to have the expertise to construct a set manually. Rekordbox's cloud analysis feature and Serato's upcoming recommendation features both focus on library management — "here are tracks you haven't played recently" or "here are tracks similar to ones you've played." Neither models the real-time constraint satisfaction problem that a DJ faces during a set.

The scaffolding, not the scaffold The goal of constraint-aware recommendation isn't to automate the DJ. It's to reduce the cognitive load of selection so the DJ can focus on the things that machines can't model: reading the room, feeling the energy, making the call that no algorithm can make. A good DJ set is a structured improvisation. The structure comes from the constraints — key, tempo, energy, density. The improvisation comes from the DJ's taste, experience, and real-time perception. A constraint-aware system handles the structure so the DJ can improvise. This is the difference between scaffolding — a temporary support that makes creative work possible — and a scaffold — a rigid framework that replaces it.

The research on AI-assisted creativity is clear: co-creative tools that preserve human agency outperform fully automated systems in terms of creative output quality and user satisfaction. The Doshi & Hauser study on AI art platforms found that AI adoption increased individual productivity by 25% but reduced collective diversity — the homogenization effect. The design implication: co-create, don't autopilot. Constraint-aware recommendation is scaffolding. It narrows the search space to structurally viable options, then gets out of the way. The DJ is still the one deciding. The system just makes sure the options on the table are all options that could actually work. Engagement-optimized recommendation gives you more of what you already like. Constraint-aware recommendation gives you things you might not have considered, but that will actually work in the context you're in.

For DJs — and for anyone making sequential creative decisions under real constraints — that's the difference between a feed and a tool.

On Metadata Quality in DJ-Curated Playlists

Wed, 15 Apr 2026 00:00:00 GMT

On Metadata Quality in DJ-Curated Playlists

When a DJ playlist lives on Spotify or Apple Music, the metadata arrives pre-digested by aggregator pipelines. But the moment that playlist is exported to a USB and loaded into rekordbox, things fall apart.

A DJ set on Spotify looks simple. You click play, the songs come out, the transitions work. But underneath that surface, the metadata for every track in that set has passed through at least four different systems — each with its own schema, its own constraints, and its own blind spots. By the time it reaches your rekordbox library, the metadata has been compressed, translated, and re-analyzed so many times that the original intent is barely recognizable. This isn't a theoretical problem. It's a practical one that affects every DJ who tries to move a playlist from a streaming service to a USB stick. The four-system gauntlet When you save a DJ-curated playlist on Spotify or Apple Music, here's what the metadata has already survived: 1 Artist entry. The artist or label enters metadata into their distributor's dashboard. BPM, key, sub-genre — all there, all correct.

The distributor's form accepts them. This is the last time most of these fields will exist in a structured format. 2 Distributor serialization. The distributor formats the release as a DDEX ERN XML document for delivery to DSPs. BPM and key aren't in the ERN schema. Sub-genres collapse into a single genre field. Producer credits get truncated. The distributor isn't being negligent — the standard literally can't carry these fields. 3 DSP re-analysis. Spotify and Apple Music receive the ERN, ingest the audio file, and run their own analysis pipeline. Spotify computes its own BPM, key (called "key" in the API but computed, not sourced from metadata), energy, valence, and danceability. Your carefully tagged BPM of 128.02 becomes Spotify's 128.0 or 127.98 — close, but different. Your key of "A minor" becomes Spotify's "A minor" — computed independently, sometimes disagrees with your tag.

4 DJ software re-analysis. When a DJ exports that track to USB and loads it into rekordbox, Serato, or Traktor, the software runs its own analysis. New BPM estimate. New key detection. New waveform. New beat grid. The DJ software has no way to access Spotify's or Apple's analysis — and even if it did, those values were computed for recommendation, not for mixing. They're not precise enough for beatmatching. The accumulation of error Each stage re-analyzes because it can't trust the previous stage's data. The result is a cascade of estimation layered on estimation. The original metadata — the BPM the producer precisely dialed in, the key they carefully selected, the sub-genre they chose to categorize their work — exists only in the distributor's database, invisible to everyone downstream.

What goes wrong in practice Here are the specific failures that DJs encounter when moving playlists between systems: Key disagreement. Spotify says "A minor." Rekordbox says "A major." Mixed In Key says "8A" (A minor). Serato says "8A." One track, three different keys. If you're harmonic mixing — and most working DJs are — one wrong key call can ruin a transition. BPM drift. The producer tagged 128.00 BPM. Spotify computed 128.02. Rekordbox computed 127.98. Serato computed 128.0. For a two-hour set, this drift is negligible. For beatmatching two tracks live, even 0.02 BPM of disagreement between your software's analysis and the track's original tempo creates audible drift over 30 seconds. Genre collapse. The artist tagged "Peak Time Techno" in their distributor portal. The distributor passed "Techno" (top-level only).

Spotify displays "Techno" but internally classifies it as Beatport requires the artist to select from its own tree, where "Peak Time Techno" is a distinct category. No two systems agree on genre, and the sub-genre distinction that matters most to DJs is gone everywhere except Beatport. Missing cue data. Cue points, loops, and beat grids — the most labor-intensive metadata a DJ creates — are locked inside each software's proprietary database. Export a Serato crate to rekordbox and you lose every cue point. There is no interchange format. There is no export button. You re-do the work. The playlist problem, specifically Playlists make the metadata problem visible because they're cross-referential. A DJ doesn't just listen to one track — they sequence tracks by key, BPM, energy, and mood. When any of those fields is wrong, the sequence breaks.

On Spotify, this mostly works because Spotify's own analysis is internally consistent — their BPM and key values are computed by the same algorithm, so they're at least consistent with each other. Spotify's "compatible tracks" recommendations work within Spotify's own estimation framework. The problem emerges at the boundary. When a DJ takes a 50-track Spotify playlist and tries to recreate it in rekordbox, every track has to be re-analyzed. The key values shift. The BPM values shift. The genre categories map to different taxonomies. The energy levels — Spotify's 0-to-1 scale vs. Mixed In Key's 1-to-10 scale — are computed by entirely different algorithms with no conversion function. Why this matters for music discovery The metadata fragmentation isn't just a DJ problem. It affects how music is discovered across every platform.

A track tagged "Peak Time Techno" on Beatport — where that label carries specific meaning about energy, structure, and arrangement — appears as just "Techno" on Spotify, where it's grouped with every other sub-genre that also collapsed into that bucket. The signal that would help a listener find exactly what they want — or help an algorithm recommend the right track for the right moment — is attenuated at every step. For DJs, the problem is acute because DJs are professional metadata consumers. They don't just listen — they sort, filter, and sequence by metadata fields that don't survive the pipeline. The gap between what the artist created and what the DJ can access is the gap between discovery and frustration. The local-analysis imperative The only reliable way to get DJ-usable metadata is to analyze locally, from the audio file itself.

This is what rekordbox, Serato, Traktor, and Mixed In Key all do — they run their own algorithms on the raw audio because they can't trust any external source. For DJs, this means accepting that the metadata you see in streaming services is advisory, not authoritative. Your DJ software's analysis is the version that matters for mixing. The upside: it's consistent within one software ecosystem. The downside: switching software means starting over. For tool builders, it means accepting that the metadata pipeline can't be fixed from the inside. DDEX won't add BPM and key fields because the DSPs and labels that govern it don't need them. The fix — if there is one — has to come from outside the pipeline, either through a new open standard for DJ metadata exchange or through tools that bypass the pipeline entirely and read the source of truth: the audio file itself.

DJ-curated playlists are canaries in the metadata coal mine. When a playlist moves from Spotify to rekordbox and half the metadata changes, that's not a bug in the transfer — it's a symptom of a pipeline that was never designed to carry the data that DJs need. Until the industry builds a metadata standard that serves creators — not just rights holders — DJs will keep re-analyzing, re-tagging, and rebuilding the work that someone already did.

The Genre Taxonomy Problem — Why Your DJ Library Can't Agree on What a Track Is

Sun, 07 Jun 2026 00:00:00 GMT

The Genre Taxonomy Problem — Why Your DJ Library Can't Agree on What a Track Is

Genre labels carry context in their original system that disappears the moment you move a track between platforms.

Ask five DJs what genre a track is and you'll get six answers. One says it's deep house. Another says it's minimal tech house. A third says it's microhouse. A fourth says it's just house. A fifth, who's been playing longer, says it's dub house — and means something specific by that. None of them are wrong. All of them are using the same word to mean different things. This is the genre taxonomy problem — and it's not just a labeling dispute. It has measurable consequences for how music is organized, discovered, and mixed. The Fragmentation Is Structural, Not Accidental Music genre taxonomies are not designed to be coherent. They're designed to serve the needs of whoever created them. Spotify's taxonomy prioritizes listener behavior and recommendation.

"Chill Lo-Fi Hip Hop Beats" exists as a genre because millions of people search for it, stream music that fits it, and follow playlists in it. Spotify's taxonomy reflects the behavioral clustering of its user base — which is useful for recommendation, useless for DJ workflow. Beatport's taxonomy prioritizes the electronic music sales and DJ tool ecosystem. It has "Microhouse," "Dub Techno," "Organic House," and "Afro House" as distinct categories — because those distinctions matter to DJs who are buying music and building sets. Beatport's taxonomy reflects the DJ tool ecosystem's way of categorizing electronic music. Apple Music's taxonomy splits the difference between editorial curation and behavioral data. It has broad categories ("Electronic," "Hip-Hop," "R&B") with nested subcategories that vary in granularity by genre. rekordbox's genre field is a free-text field.

DJs type whatever they want. The result is a chaotic mix of styles, scenes, and personal shorthand that no algorithm can reliably parse. These four taxonomies were built for different purposes, by different organizations, at different times. They don't map to each other. A track that Apple Music calls "Deep House" Spotify might call "House." A track that Beatport classifies as "Melodic House & Techno" rekordbox might have as "Melodic" — or nothing at all. What Gets Lost in Translation When a playlist moves from one platform to another — Spotify to rekordbox, Beatport to Apple Music — the genre field rarely survives intact. But the problem runs deeper than just a label mismatch. The issue is that genre labels carry implicit structural information in their original context that disappears when the label is translated. Take "Afro House.

" In the Beatport taxonomy, "Afro House" is a specific sound: driving 4/4 kick, percussion patterns rooted in African rhythmic traditions, melodic elements that reference African instruments and scales, typically 120–124 BPM. It's a well-defined category with production conventions and a recognizable sound. When that label gets mapped to Spotify's "Afro House," Spotify's algorithm may apply it more broadly — any track with African percussion influences or a certain rhythmic feel — and the result is a category that includes tracks from 110 to 128 BPM, tracks with and without the traditional kick pattern, tracks from completely different production lineages. The label survives but the structural information it carried in the original taxonomy is lost. This is why playlist portability is so destructive to DJ library quality.

When you export a playlist from Spotify and import it into rekordbox, the genre information comes with it — but it's Spotify's genre information, not rekordbox's. And Spotify's genre information was designed for recommendation, not for DJ workflow. The tracks in your "Afro House" playlist now have a genre tag that means something different in the context of your DJ library than it did in the context where it was assigned. Why Genre Consistency Matters for DJ Tools A DJ library with inconsistent genre labels is harder to search, harder to organize, and harder to get recommendations from.

If you search for "deep house" in a library where genre labels are free-text and inconsistent, you'll get a subset of actual deep house tracks, a bunch of tracks that someone labeled "deep house" because they had no better label, and nothing from tracks that should be in the category but are labeled differently. The same problem affects recommendation. If a recommendation engine is trained on genre labels from a library with inconsistent taxonomy, it will learn to associate genre labels with certain sonic features — but the association will be noisy, because the labels themselves are noisy. Garbage in, garbage out. This is why key detection and BPM analysis are more reliable than genre tagging in DJ library tools. Key and BPM are physical measurements — they don't depend on who labeled the track or what convention they were using.

Genre is a cultural label that carries context in a way that measurements don't. The Sub-Genre Proliferation Problem The fragmentation of genre taxonomies has accelerated as electronic music sub-genres have proliferated. In the 1990s, "techno" and "house" were the primary categories, with some sub-genre distinction (hardcore, ambient, progressive house). Today, Beatport lists over 40 top-level electronic genres, with multiple levels of sub-genre nesting below them. This proliferation creates a labeling problem: producers label their tracks with the most specific sub-genre they think applies, but DJs who are looking for music to play in a set might be searching at a different level of specificity. A producer might release a track as "Melodic House & Techno" — a specific Beatport sub-genre — because that's the most accurate description of the production style.

But a DJ building a set might be searching for "Melodic House" as a broader category, and the track won't appear in their search because the label is more specific than their query. The same problem in reverse: a DJ might label all their melodic techno tracks as "melodic techno," but some of them are more accurately "melodic house" — and when they search for "melodic house" they miss tracks that should have appeared. The proliferation of sub-genres is a sign that the taxonomy is becoming more accurate — but it creates friction for anyone who is navigating the taxonomy at a different level of specificity than the producer who labeled the track.

What a DJ-Friendly Genre Taxonomy Would Look Like A genre taxonomy designed for DJ workflow — rather than recommendation, sales categorization, or editorial curation — would be organized around the mixing properties of tracks rather than their production lineage. Instead of "Afro House" vs. "Melodic House & Techno" vs. "Organic House" — categories that describe production style — a DJ-focused taxonomy might use: Groove type: 4/4 driving, broken (hip-hop/jazz), syncopated (funk/disco), free-form (ambient/experimental) Spectral character: dark/warm, bright/aggressive, textural/ambient, clean/lo-fi Energy profile: build-and-release, linear, peak-and-fade, static Tempo family: deep (<120), standard (120–128), elevated (128–135), high-energy (>135) These axes describe how a track feels in a mix rather than where it came from historically.

Two tracks with different production lineages but similar groove type, spectral character, and energy profile will mix well together — regardless of what genre label they carry. This is closer to how experienced DJs actually think about track compatibility. When a DJ says two tracks "feel similar" or "sit in the same world," they're usually describing mixing properties, not genre labels. No current DJ tool has a taxonomy built around mixing properties. They all inherit their genre taxonomy from one of the DSPs — Spotify, Apple Music, Beatport — and the friction that creates is invisible until you try to use the genre field as a search or recommendation axis.

The Road Forward The genre taxonomy problem won't be solved by standardization — there's no governing body that can force Spotify, Apple Music, Beatport, and every other DSP to use a shared taxonomy, and no reason to believe they'd converge even without competing commercial interests. The more tractable solution is for DJ library tools to stop treating genre as a reliable categorical variable and start treating it as one signal among many — alongside BPM, key, energy, danceability, and spectral profile. When recommendation engines weight genre labels heavily, they inherit all the noise from inconsistent taxonomies. When they weight genre labels lightly — using them as one of many features rather than a primary filter — the noise matters less.

This is already happening in some tools: VibeNet-style analysis produces energy and danceability scores that are more reliable predictors of track compatibility than genre tags. The recommendation engine uses them as primary features, with genre as a secondary signal. The genre taxonomy problem isn't going away. But as DJ tools get better at measuring the actual sonic properties of tracks — BPM, key, spectral profile, energy, danceability — the dependence on genre labels as a primary compatibility axis will decrease. And when that dependence decreases, the fragmentation of music taxonomies will matter less. Until then, the best practice for DJ library management is to treat genre labels as loose hints, not reliable facts — and to invest in the manual curation work that fills the gaps left by inconsistent taxonomies.

Key Detection Is an Opinion, Not a Fact

Sun, 07 Jun 2026 00:00:00 GMT

Key Detection Is an Opinion, Not a Fact

Every DJ tool runs its own key algorithm on the same audio file and gets a different answer. That isn't a bug. It's the nature of estimating tonal center from a waveform.

Load the same WAV into rekordbox, Mixed In Key, and Spotify's internal analysis pipeline. Run keyfinder-cli on it locally. Ask three working DJs what key they'd call it. You will not get one answer. You will get a cluster of answers that are related — often harmonically adjacent, sometimes a relative major/minor pair — but not identical. DJs treat key tags like facts. Software vendors present them like facts. The tags are estimates. And the gap between estimate and ground truth is where harmonic mixing breaks. What key detection is actually measuring Automated key detection does not read sheet music. It does not parse MIDI. It does not ask the producer. It builds a Harmonic Pitch Class Profile (HPCP) — a 12-bin histogram of how much energy appears at each pitch class (C, C#, D, and so on) across a window of audio.

The algorithm then compares that profile against template profiles for each major and minor key and picks the best match. This is a reasonable proxy for tonal center in simple, diatonic material. It falls apart quickly when the music does any of the following: modulates mid-track, spends long stretches on a single borrowed chord, layers detuned synths, uses heavy distortion that smears harmonics, or treats the kick drum's fundamental as the tonal anchor when the harmony lives an octave higher. The core problem Key detection answers: "which diatonic scale best explains the spectral content in this window?" It does not answer: "what key did the producer intend for mixing purposes?" Why different tools disagree The disagreement between rekordbox, Mixed In Key, Spotify, and open-source tools like keyfinder-cli is not random.

Each system makes different choices at four decision points: 1 Window selection. Some algorithms analyze the full track. Others weight the chorus. Others skip intros and outros. A track that modulates from verse to drop will return different keys depending on which section dominates the analysis window. 2 Tuning reference. Not every master is tuned to A=440 Hz. Club masters pitched slightly up or down shift the chroma vector. Algorithms that assume equal temperament at 440 Hz will misread detuned or vintage-sampled material. 3 Major/minor ambiguity. C major and A minor share the same pitch classes. The difference is which note functions as tonal center — a statistical distinction, not a spectral one. Algorithms frequently flip between relative major and minor pairs, especially on sparse electronic arrangements where the bass root is ambiguous. 4 Notation mapping.

Mixed In Key outputs Camelot codes (8A, 9B). rekordbox outputs traditional names (Am, F#m). Spotify's Audio Features API returns key as an integer 0–11 plus mode (major/minor). Same underlying estimate, three different label systems — and sometimes three different estimates because the mapping step isn't the only difference. The Camelot wheel is a UI, not a ground truth The Camelot wheel (popularized by Mixed In Key) is genuinely useful. It encodes compatible keys as adjacent numbers and letters, which lowers the cognitive load during a live mix. But the wheel assumes the input key is correct. If the algorithm mislabels A minor as C major, the wheel will confidently recommend compatible tracks in the wrong harmonic neighborhood. Harmonic mixing works when the tags are right. When they're wrong, the wheel becomes a confidence amplifier for bad data.

This is why experienced DJs verify keys by ear on the first mix — not because they distrust technology, but because they distrust unverified technology. Where streaming metadata makes it worse Spotify's key field in the Audio Features API is computed, not sourced from label metadata. Apple Music does not expose key at all in consumer APIs. Beatport carries key tags entered by labels and distributors — closer to intent, but inconsistently applied and still not validated against the audio on ingest. When a DJ builds a playlist on Spotify and exports tracks to rekordbox, the key values don't transfer. rekordbox re-analyzes from the file. The new value may disagree with what Spotify showed — and with what Mixed In Key showed before that. None of the three is "more true" in an absolute sense. They're different estimators applied to different file versions (compressed stream rip vs.

lossless purchase vs. promo WAV) at different times. What a high-confidence key workflow looks like Professional prep treats key tags as hypotheses to be confirmed, not facts to be sorted by. A practical workflow: Run local analysis on the actual file you'll play — not a streaming preview, not a transcode. When two tools disagree, audition the transition with both keys and keep the one that sounds consonant on your monitors. Lock verified keys in a comment field or custom tag that survives database writes — not in a field that gets overwritten on re-analysis. Re-analyze after any pitch shift, warp, or DJ edit. Key is a property of the audio signal, not the original release. Tools like keyfinder-cli (the same chromagram-based estimator Mixxx uses) and rekordbox's built-in analysis are good starting points.

Bonk runs keyfinder-cli locally against decrypted entries so the tag lands in the same field rekordbox reads — but the DJ still owns verification. No algorithm ships with a confidence interval UI. The ear remains the final QA step. Implications for ML and recommendation Any recommendation system that filters by key — "show me tracks compatible with 8A" — inherits the error rate of whatever key estimator produced the tags. Train on 50,000 tracks with auto-detected keys and a measurable fraction of your training labels are wrong. Not randomly wrong: systematically wrong on modulating tracks, live recordings, and sparse electronic arrangements — exactly the material DJs care about most. Human-verified key labels — tagged at high confidence after ear check — are scarce and expensive.

That's why constraint-aware systems that take key seriously also need a way to surface uncertainty: adjacent-key suggestions, not just exact-match filters. Treating key as a hard constraint on noisy labels produces brittle recommendations. Treating it as a soft constraint with human override produces tools DJs actually use. Key detection is one of the most useful features in DJ software — and one of the least honest. The number in the tag column looks precise. It isn't. The path forward isn't better marketing copy about accuracy. It's workflows that assume estimation error, preserve human verification, and never destroy a corrected tag on re-import. Your library's keys should reflect what you hear, not what an algorithm guessed once and wrote in permanent marker.

Rekordbox XML Is Killing Your Library

Wed, 20 May 2026 00:00:00 GMT

Rekordbox XML Is Killing Your Library

The export that erases your work, and why the fix requires going under the hood.

You've been there. You spent two hours tagging a new batch of tracks — energy levels, key, color codes, comments, all of it. Hit export. Loaded the USB. Half your comments are gone. Two BPMs are wrong. A third of your tracks lost their custom ratings. And the worst part: you have no idea when it happened or how to get it back. This isn't a bug. It's a design constraint that's been baked into the rekordbox XML workflow for over a decade. What XML actually does to your tags rekordbox stores an extraordinary amount of metadata: track ratings, color codes, cue point names, playlist hierarchy, comment fields, analysis data like waveform position and beatgrid offsets, and the full history of your preparation activity. When you export to XML, rekordbox makes a choice about what to include. It doesn't export everything.

It exports a view — a flattened, lossy serialization of a subset of your library state. Fields that don't fit the XML schema get silently dropped. Fields that do fit get mangled: Unicode characters in comments become escape sequences, custom rating scales get normalized to a different range, and timestamp fields lose millisecond precision. You don't see it happen. rekordbox loads the XML on the other side and everything looks fine — until you start digging and realize half your prep work vanished somewhere in the translation. The core problem XML is a serialization format, not a database. Every export is a snapshot — not a sync. And snapshots lose writes that happen between the export and the re-import.

The round-trip tax The typical DJ workflow involves at least one XML round-trip per session: export from rekordbox, open in a third-party tool, make changes, export back, re-import into rekordbox. Each round-trip compounds the loss. Comment fields get truncated. Custom cue metadata gets normalized. Energy and danceability ratings — if they're even in the XML at all — get reset. For most DJs, this means either living with the loss or doing meticulous re-tagging after every import. Neither is a reasonable workflow. The software is supposed to serve the DJ, not the other way around. Why the .db file is the answer rekordbox stores everything — every field, every cue, every rating — in a SQLite database called . This is the source of truth. The XML export is just a conveniently formatted excerpt.

Reading and writing the database directly means working at the same level of fidelity as rekordbox itself. No lossy translation. No silent drops. Every field is preserved because you're reading the same data structure the DJ software reads. The complication: the database is encrypted. Pioneer DJ uses SQLCipher to protect the library data, which means a direct read requires understanding the encryption schema — and that part takes some engineering to get right. What Bonk! does differently Bonk! reads the master.db directly via pyrekordbox (a vendored Python bridge that handles the SQLCipher decryption) and writes changes back with the same fidelity. When you update a track's energy score or add a chapter marker, it lands in the database in the exact format rekordbox expects — no XML serialization in between.

The write path is atomic: changes are applied in a transaction and committed only on success. If the process is interrupted — power loss, crash, bad file — the database remains consistent. No partial writes. No corruption. For DJs who want to run audio analysis on their library, Bonk! doesn't require uploading anything anywhere. All feature extraction runs locally: BPM via aubio, key detection via keyfinder-cli (same algorithm Mixxx uses), and energy/danceability via VibeNet — a locally-run ML model. Your library never leaves your machine. The local-first constraint Cloud-first DJ tools have an inherent tension: your library is your most valuable asset, and yet you have to trust a third party to hold it intact. Cloud services can lose data, change terms, get acquired, or simply go offline.

A local-first workflow means your library is always accessible — on a plane, in a venue with poor wifi, or five years from now on whatever hardware you're using then. Bonk! is built on this constraint by default. There is no cloud component. There is no account. There is no sync service. Your library, your machine, your edits. The XML workflow is a workaround for a problem that didn't need to exist. As the DJ software ecosystem evolves, the tools that respect the integrity of your library — and your time — will win. Direct database access isn't a niche technical detail. It's the foundation of everything else.

The Frequency Profile Problem — Why BPM and Key Are Not Enough to Match Tracks

Sun, 07 Jun 2026 00:00:00 GMT

The Frequency Profile Problem — Why BPM and Key Are Not Enough to Match Tracks

Same tempo, same key, completely different mixing feel. Here's what the tools aren't measuring.

A 128 BPM track in A minor can sound like two completely different records. One is warm, round, mid-focused — the bass fills the space, the vocals sit in a pocket around 2kHz, the high end is a soft sizzle rather than a sharp transient. The other is brittle, wide, aggressively bright — cymbals cut through, the kick has a fast attack with a long resonant tail, the synth stabs have an edge that makes your speakers work harder than they should. Same BPM. Same key. Completely different mixing feel. BPM and key are the standard axes for track compatibility. Every DJ tool uses them. But they're measuring the wrong thing — or more precisely, they're measuring a derivative of what actually matters, while ignoring the primary signal.

What you actually respond to when you decide two tracks "go together" is the frequency profile — the distribution of energy across the spectrum and how that changes over time. What a Frequency Profile Actually Is Every audio signal can be decomposed into frequencies. The low end (20–250 Hz) carries the kick, bass, sub warmth. The midrange (250 Hz–4kHz) carries the body of guitars, vocals, snare body. The upper midrange (4–8kHz) carries presence, attack, the edge of snares and guitars. The high end (8–20kHz) carries air, shimmer, reverb tails, cymbals. A spectral centroid is the weighted average of those frequencies — a single number that describes where the "center of mass" of the spectrum sits. A high centroid means a bright, aggressive sound. A low centroid means a dark, warm sound. But a single number loses too much.

A better model is a spectral profile — the shape of the distribution across frequency bands over time. Some tracks are flat (consistent energy across bands, like orchestral music). Some are V-shaped (big bass, big treble, quiet mids — like a lot of modern EDM). Some are mid-focused (vocals, guitars, snare-heavy rock). These shapes describe the texture of a record in a way that BPM and key cannot. When DJs talk about a "warm" track versus a "harsh" one, or describe a mix as "dark and driving" versus "bright and euphoric," they're describing spectral profiles — often without knowing it. The Problem with Pure BPM/Key Matching Most recommendation engines for DJ software treat BPM and key as primary features and everything else as secondary noise.

You set your key filter to "compatible" (or a tighter window like ±1), you set your BPM range to ±5%, and the software surfaces tracks that fit those constraints. But this produces false positives constantly. Two tracks can be BPM and key compatible and still sound completely wrong together — because the spectral profile of one is diametrically opposed to the other. Consider: you're playing a late-night techno set and you want to transition from a dark, industrial track (low spectral centroid, dense low-mid energy, slow attack on transients) into a euphoric trance track (high spectral centroid, wide high-frequency presence, fast attack). Both might be 138 BPM in F minor. The key compatibility and BPM match are satisfied. The mix will sound like a car crash. The same problem shows up in reverse.

A bright, aggressive electro track and a warm, round house track at the same BPM and compatible key will create a clash because their spectral profiles are fighting for the same frequency space in different ways. This is why experienced DJs develop an intuitive sense for spectral profile that no software currently models well. They feel when two records are "from the same world" sonically — not just the same tempo and key. How Energy and Danceability Connect to Spectral Profile Most analysis frameworks treat energy and danceability as separate dimensions, derived from dynamics and tempo pattern analysis. But both are deeply connected to spectral profile. Energy — in tools like Spotify's API or VibeNet-based analysis — correlates strongly with spectral centroid and dynamic range.

High-energy tracks tend to have a higher centroid (brighter), more compressed dynamics (louder overall with less variation), and faster attack times. Low-energy tracks are darker, more relaxed, with wider dynamic variation. Danceability — in Spotify's model, at least — is derived from tempo stability, beat strength, and spectral regularity. A highly danceable track has a consistent, predictable rhythm with strong downbeats and minimal syncopation. But spectral profile also plays a role: certain frequency distributions feel more "groovy" or "bounceable" than others, even at the same tempo. VibeNet-style analysis models energy and danceability as separate axes on a 0–10 scale. A track with energy 8 and danceability 4 is a bright, aggressive, but rhythmically complex record (something like certain techno or industrial).

A track with energy 6 and danceability 9 is warm, groove-forward, and accessible (classic house). These composite scores are better predictors of mix compatibility than BPM or key alone, because they encode spectral profile information in a compressed form. But even this composite score is a simplification. The full spectral profile contains information that neither BPM/key nor composite energy/danceability scores can capture. What Spectral Contrast Tells You A more granular approach than spectral centroid is spectral contrast — measuring the difference in energy between spectral peaks and valleys across frequency bands. In a rock track recorded with live drums, the spectral contrast is high: the snare hits hard in the mid-high range, creating a strong peak, while the space between drum hits has low energy.

In a synth pad or string section, the spectral contrast is low: energy is more evenly distributed across bands with no sharp peaks. High spectral contrast tracks feel "rhythmic" and "percussive." Low spectral contrast tracks feel "textural" and "ambient." Mixing two high-contrast tracks together creates a dense, clashing picture. Mixing two low-contrast tracks can sound washed out. The right combinations — one high contrast, one low — create the interplay between clarity and texture that makes a great transition. Most DJ tools don't expose spectral contrast data at all. The ones that do usually present it as an abstract chart that requires audio engineering knowledge to interpret.

The opportunity is to make this information actionable — to surface it as a compatibility axis alongside BPM and key, so DJs can quickly identify when two records have complementary spectral profiles rather than clashing ones. Building Toward Better Similarity The practical goal isn't to replace BPM and key — they're useful filters. The goal is to add spectral profile as a third axis, so compatibility is measured across three dimensions rather than two. This requires audio analysis pipelines that can compute spectral features at scale. KeyFinder for key detection, Essentia for BPM analysis, VibeNet for energy/danceability — adding spectral centroid or spectral contrast to this pipeline is a matter of computation time. These features can be extracted from the same audio analysis pass as BPM and key.

The output would look something like this in a DJ library UI: instead of just showing "128 BPM / A minor," a track would show "128 BPM / A minor / centroid 4.2kHz / energy 7 / danceability 8" — where centroid gives you the bright/dark axis and the energy/danceability scores fill in the texture. With these three axes, a similarity function can rank tracks by composite distance rather than just BPM and key compatibility. The result would be recommendations that actually sound right together, not just technically compatible. The Bigger Picture Every DJ develops an internal model of what makes tracks work together. It's not just tempo and key — it's the whole sonic picture: how a record feels in a club, what frequency ranges it occupies, how its energy builds and releases. That's spectral profile. Current tools don't give you a way to externalize or query that model.

They give you BPM, key, and sometimes energy — and then ask you to do the rest by ear. The next generation of DJ library tools will close that gap. Spectral profile features are computationally cheap to extract and easy to surface as a compatibility axis. The challenge is making them interpretable — translating "centroid 4.2kHz" into "bright, aggressive, percussion-forward" so DJs can act on the information without an audio engineering degree. That's where the field is heading. And when it gets there, the false positive problem in track matching — tracks that satisfy BPM and key constraints but sound completely wrong together — will finally be solved.

The Half-Time BPM Trap

Mon, 08 Jun 2026 00:00:00 GMT

The Half-Time BPM Trap

Tempo detection doesn't read a metronome. It counts onsets. When the kick pattern implies a different pulse than the hi-hats, the algorithm picks a side — and DJs pay for it.

You're prepping a DnB set. The track is unmistakably 174 BPM — you can feel the amen break pushing at that speed. rekordbox analyzes it and reports 87 BPM. You double-click to fix it, drag the beatgrid, and move on. Twenty tracks later, you've spent forty minutes on grid correction instead of cue points. This is the half-time BPM trap: tempo estimators lock onto a subharmonic pulse — half or double the musical tempo — because that's where the strongest periodic energy lives in the signal. How tempo detection actually works Most DJ software uses onset detection followed by autocorrelation or comb-filtering over an onset strength signal. The algorithm finds periodic peaks — moments where energy spikes — and infers a tempo that best explains those peaks. Libraries like aubio (used in Mixxx and many open-source tools) and rekordbox's proprietary analyzer follow this general pattern.

The method works well on four-on-the-floor house and techno where the kick lands on every beat and the spectral flux is unambiguous. It degrades on any arrangement where multiple plausible pulse layers exist at related tempos. The octave ambiguity If onsets cluster at both 70 ms and 140 ms intervals, the autocorrelation function has peaks at both 70 BPM and 140 BPM. The algorithm must choose. It often chooses the slower peak because lower-frequency periodicity tends to have higher energy in the onset envelope. Genres where it fails predictably Drum & bass and jungle. The breakbeat's shuffle creates a dense onset pattern. The perceived groove lives at 170+ BPM, but the kick pattern may emphasize every other downbeat, pulling estimators toward half tempo. Hip-hop and trap. Hi-hats and percussion at double-time over a half-time kick is the standard production technique.

Spotify may report 140 while rekordbox reports 70 on the same file — both defensible readings of different rhythmic layers. Ambient and downtempo. Sparse percussion with long gaps between kicks produces weak onset signals. Estimators latch onto hi-hat patterns or reverb tails, producing BPM values that are technically periodic but musically meaningless. Live drummers. Human timing drift — pushing, pulling, ghost notes — violates the constant-tempo assumption baked into beatgrid generation. The detected BPM is an average, not a truth. Tracks with tempo changes. Ramps, breakdowns, and live remixes break single-tempo models entirely. The software reports one number; the music contains several. Why the beatgrid makes it visible BPM is an abstract number. The beatgrid is where tempo errors become physical.

When rekordbox sets a grid at 87 BPM on a 174 BPM track, the waveform's transients fall between grid lines. Sync drifts. Quantize lands on the wrong beat. The DJ compensates with manual nudging — or fixes the grid by hand. Beatgrid editing is the most tedious prep work in DJ software. Pioneer, Serato, and Native Instruments have all invested heavily in analysis quality over the years, but none has solved octave ambiguity because the problem is underdetermined from audio alone. Two tempos related by a factor of two can both produce valid onset periodicity scores. The streaming-vs-local BPM gap Spotify's Audio Features API returns as a float — computed by Spotify's own pipeline on their mastered copy. When a DJ downloads or purchases the track and analyzes locally, the value often differs by fractions of a BPM or, in half-time cases, by a factor of two.

Neither value is the producer's DAW tempo unless someone typed it in and it survived distribution — which, as covered in The Song Is Not the File , almost never happens. The DJ's rekordbox BPM is the one that matters for beatmatching, but it must be validated track by track. A prep workflow that survives bad detection Working DJs develop heuristics. Some are universal: 1 Tap tempo first. Before trusting the analysis, tap the track's perceived pulse on the software's tap-tempo button. If it disagrees with the auto-detected value by 2× or 0.5×, you found the trap. 2 Set the grid on the downbeat you mix on. Not the theoretical downbeat — the one your transition targets. For half-time hip-hop, that might be the snare grid, not the hi-hat grid. 3 Lock corrected grids. Re-analysis overwrites your fixes.

Mark tracks as analyzed, back up your database, and avoid bulk re-analyze on libraries you've manually corrected. 4 Use genre-aware expectations. If a track is labeled DnB and reports 85–90 BPM, assume half-time error before assuming it's a downtempo experiment. What better tooling would require Fixing tempo detection at the algorithm level means going beyond onset autocorrelation. Research systems combine multi-band onset detection, genre classification as a prior (DnB is unlikely to be 87 BPM), and beat tracking models trained on annotated datasets. Even then, ground truth is contested: is the "correct" tempo the DAW project tempo, the perceived foot-tap tempo, or the tempo a DJ can most easily mix at? For library-scale prep, the pragmatic answer is local re-analysis with a second estimator as a sanity check.

Bonk runs BPM detection via aubio against the same files rekordbox stores — useful for batch flagging tracks where estimators diverge by more than a threshold. But the beatgrid still needs human confirmation on any track you plan to sync. The metadata lesson BPM is the simplest numeric tag in a DJ library and the one most likely to be wrong without obvious symptoms until you hit sync on a booth monitor. It sits alongside key and energy as a structural constraint for set building — and like key, it's an estimate that gets treated as fact because the UI renders it as a single precise number. The half-time trap is a reminder that audio analysis outputs are hypotheses. The workflow that respects your time is one that surfaces disagreement, preserves corrections, and never forces you to re-prove the same tempo fix twice because an export stripped your grid.

Every DJ has a story about the track that analyzed wrong. The trap isn't ignorance — it's software that presents a guess with the same confidence as a measurement. Until tempo estimators expose uncertainty, the best tool remains a trained ear, a tap-tempo button, and a library that remembers your corrections.

The Song Is Not the File

Sun, 10 May 2026 00:00:00 GMT

The Song Is Not the File

Music metadata is the unglamorous infrastructure underneath every stream, sync license, and royalty payment. Here's what actually happens to a song between upload and play.

When you upload a song to Spotify through DistroKid or TuneCore, you fill in a form. Title, artist, genre, maybe an ISRC code if you have one. You hit submit and the song appears on streaming platforms a few days later. You assume the data you entered traveled with it. Most of it didn't. The distance between what an artist enters and what a listener sees is not a gap. It's a canyon. And the things that fall into it — BPM, musical key, mood, energy, sub-genre, producer credits — are exactly the metadata that DJs, music supervisors, and recommendation engines need most. The metadata pipeline is not a pipeline A pipeline implies data flows from one end to the other without loss. What actually happens is closer to a game of telephone.

Each stage — DAW export, file tagging, distributor intake, DSP processing — has its own schema, its own constraints, and its own opinions about which fields matter. The fields that don't fit get dropped. The fields that do fit get overwritten. The result: the only metadata that reliably survives from artist to listener is title, artist name, ISRC, UPC, primary genre, and release date. Everything else — BPM, key, energy, sub-genre, producer credits, mood tags — is either stripped or replaced. The pipeline graveyard DDEX ERN — the XML standard that connects every distributor to every DSP — has fields for 47 types of rights holder. It does not have a field for BPM or musical key. The standard literally cannot carry the information that DJs need most. What goes in vs.

what comes out Here's what happens to specific metadata fields as a track moves from an artist's DAW to a listener's phone: Field DAW Tag Distributor DSP Survives? Title ✓ ✓ ✓ Yes Artist ✓ ✓ ✓ Yes ISRC ✓ ✓ ✓ Yes Genre ✓ Simplified Own taxonomy Partial BPM ✓ ✓ then stripped Own analysis No — overwritten Key ✓ Dropped Own analysis No — dropped Mood / Energy ✓ Dropped Own analysis No — dropped Sub-genre ✓ Collapsed Own taxonomy No — replaced Producer credits ✓ Truncated Partial No — partial If your BPM can't survive the trip from your DAW to someone else's DJ software, it's not your BPM anymore. It's Spotify's BPM. Or Apple's. Or whoever recomputes it on the other end. The artist's intent is replaced by an algorithm's estimate.

What an ISRC actually does (and doesn't) ISRC codes — the 12-character identifiers assigned to individual recordings — are free through the RIAA and cost nothing to generate. They're the closest thing the music industry has to a universal ID for a specific recording. But most independent artists either don't know they exist or skip them entirely, which means streaming platforms auto-assign their own identifiers. Those auto-assigned codes often don't propagate consistently across services, making it impossible to definitively say that this recording on Spotify is the same recording on Apple Music. UPCs — the barcodes for releases — face a similar problem. Get one through your distributor and it's tied to their account. Switch distributors later, and you may lose the identifier that your release has been indexed under for its entire life.

Why distributors strip what they strip The major distributors — DistroKid, TuneCore, CD Baby — aren't being careless. They're following DDEX ERN, the Electronic Release Notification standard that defines the XML format for delivering metadata from distributors to DSPs. ERN v3.8, the version most widely deployed, simply doesn't have fields for BPM, musical key, energy level, or mood. It was designed for the recording industry's commercial needs — rights management, royalty tracking, territorial availability — not for the creative needs of the people making music. DDEX ERN v4.0 adds some fields, but still not BPM or key. This isn't an oversight. The standard's governance body — the DDEX Board, composed of major labels and DSPs — has no DJ software companies on it. The people who need these fields most aren't in the room where the standard is written.

The DSP overwrites Here's the part that surprises most people: Spotify and Apple Music don't display your BPM. They display their BPM. Spotify runs its own audio analysis pipeline on every track it receives. The "audio features" in the Spotify API — energy, valence, danceability, tempo, key — are all computed by Spotify's algorithms, not sourced from distributor metadata. This means that even in the hypothetical scenario where a distributor preserved your BPM, the DSP would overwrite it with its own estimate. The artist's carefully tagged metadata only survives if the DSP chooses to respect it. And for DJ-relevant fields — BPM, key, energy — they don't. Beatport is the sole exception. The DJ-focused marketplace requires BPM and key as first-class metadata because its customers are DJs who sort and filter by those fields. But Beatport serves a niche.

The platforms that reach listeners — Spotify, Apple Music, YouTube Music — treat DJ metadata as irrelevant. The DJ's end of the pipeline When a DJ imports a track into Rekordbox, Serato, or Traktor, the software re-analyzes the entire file locally. It computes BPM, detects key, generates waveform data, and builds beat grids from scratch. This isn't redundant — it's the only reliable path. The metadata pipeline has already mangled or discarded the original values by this point. But there's a second, less visible problem: the DJ software silos don't talk to each other either . Switch from Serato to Rekordbox and you lose your cue points, your beat grids, your energy ratings, your loop markers. None of these have a standardized interchange format. Every DJ platform is a walled garden, and the walls are built from the same metadata that the pipeline already fragmented.

How to plan around it If you're an independent artist, there are practical steps that reduce — though they can't eliminate — the damage: 1. Always assign ISRCs yourself. They're free through the RIAA. Don't let your distributor auto-generate them — you'll lose control of your recording identifiers if you ever switch distributors. 2. Tag your files before uploading. Use MusicBrainz Picard or a dedicated tagger to embed ISRC, UPC, genre, and credits directly in the audio file. This won't survive distribution, but it ensures your master files are correct. 3. Expect DSP BPM and key to be estimates, not your values. If you need precise BPM and key for DJ use, they have to live in your DJ software's local analysis — not in the streaming platform's metadata. 4. Use Beatport for DJ-facing releases. It's the only major platform that accepts and displays BPM and key as first-class metadata.

If your audience is DJs, Beatport matters more than Spotify for discoverability. 5. Keep a local spreadsheet. Until there's a true metadata standard that survives the full pipeline, the most reliable DJ metadata is the spreadsheet you maintain yourself. BPM, key, energy level, genre subcategory — everything the pipeline strips, you track locally. The metadata pipeline is not a pipeline. It's an analysis-reanalysis chain where every stage may overwrite what came before. The song is not the file. The file is not the metadata. And the metadata that reaches the listener is not the metadata the artist created. Until the standards catch up with the people who actually use them, the gap between creation and discovery will keep swallowing the details that matter most.

Tonality Is Not Key — The Case for Smarter Harmonic Mixing

Mon, 01 Jun 2026 00:00:00 GMT

Tonality Is Not Key — The Case for Smarter Harmonic Mixing

The Camelot Wheel tells you which keys are numerically adjacent. It doesn't tell you why two tracks in the same key can still clash, or why a tritone jump sometimes works.

In an earlier post — Constraint-Aware Recommendation as Creative Scaffolding — I argued that mainstream recommendation systems optimize for engagement, not for the structural constraints that determine whether two tracks can actually be mixed together. BPM and key are treated as features in a similarity model. I argued they should be treated as constraints — hard filters that eliminate structurally incompatible options before any ranking happens. This post pushes on the second constraint: key compatibility. Not because the first editorial got it wrong, but because the deeper I dug, the more I realized that "key compatibility" as practiced by most DJs is a useful approximation built on a flawed theoretical foundation. The Camelot Wheel is not music theory. It is a numerology derived from music theory, and the gap between the two creates real failure modes in practice.

What "key" actually means (and doesn't) When a key detection algorithm assigns a track the label "8A" or "A minor," it is making a claim: this track's harmonic content is organized around the pitch class A, and its scale degrees follow the natural minor pattern. This is a simplification — a useful one, but a simplification nonetheless. The problem is that most electronic music — house, techno, hip-hop, trance — is not organized around functional harmonic progressions in the classical sense. A four-bar loop in A minor might cycle through Am → F → C → G indefinitely with no harmonic movement at all. The "key" of the track is a property of the dominant pitch class and the implied scale, not of a dynamic harmonic journey.

Two tracks in 8A can have completely different harmonic palettes: one might stay strictly within Am → F → C → G, while another introduces borrowed chords from A Dorian, Phrygian mode, or chromatic alterations that the key detection algorithm has averaged away. This is the first failure mode of key-as-constraint: key detection collapses the harmonic complexity of a track into a single label, and that label may not represent what the track actually sounds like at any given moment. The circle of fifths as a compatibility map Before diving into frameworks, it's worth understanding the underlying geometry. The circle of fifths arranges all 12 major keys and their relative minors by ascending perfect fifths: C → G → D → A → E → B → F♯/G♭ → D♭ → A♭ → E♭ → B♭ → F → C. Adjacent keys on the circle share six of seven notes — maximum harmonic overlap.

Keys separated by one position (one fifth) are the most compatible transitions. Keys opposite each other on the circle — roughly six positions apart — are a tritone apart and share no pitch classes, creating maximum harmonic friction. The Camelot Wheel is, at its core, a flattened, numbered version of the circle of fifths — with the critical difference that it collapses major and minor into the same numeric position (e.g., 8A and 8B both center on C/C minor). The letter distinguishes mode, not root. This is useful for DJs who don't read sheet music. It is less useful as a theory of harmonic compatibility. Why adjacent ≠ always compatible 8A (A minor) and 9A (B minor) are one step apart on the Camelot Wheel — treated as "compatible" by the standard rules. But A minor and B minor share no pitch classes.

What makes them adjacent is that their key signatures are fiveths apart: A minor has two sharps (F♯, C♯), B minor has two sharps (F♯, C♯). The compatibility claim is about key signature geometry, not about whether the tracks sound good layered. Neo-Riemannian theory: the geometry the Camelot Wheel misses Neo-Riemannian theory, developed by David Lewin and others in the 1980s, provides a more precise geometric map of harmonic relationships than the circle of fifths alone. Its core insight is that the relationships between keys can be classified into three elementary transformations — and these transformations correspond to minimal voice-leading distances, not just shared key signatures.

Operation Effect Example DJ relevance P (Parallel) Major ↔ minor of the same tonic C major ↔ C minor Mode switches within a track — breakdown to drop R (Relative) Major ↔ relative minor (same key signature) C major ↔ A minor Shares all pitch classes — guaranteed blend L (Leading-tone) Shifts tonic to a third below via shared chord C major ↔ E minor Enables surprising but smooth distant-key blends The full network of these relationships — called the Tonnetz — maps keys in three dimensions rather than one, capturing relationships that a circular arrangement misses entirely. Chromatic mediant relationships (C major ↔ E♭ major), for instance, are not adjacent on the circle of fifths but are connected by a single L or R operation in the Tonnetz and sound smooth in practice.

The Camelot Wheel, which only encodes the circle of fifths, can't represent these relationships without additional rules. Parsimonious voice leading: why some transitions just work Parsimonious (or smooth) voice leading describes chord progressions where each voice moves the minimum possible distance — typically one semitone or zero. The classic example: C major (C-E-G) to E minor (E-G-B) moves only the C down to B — a one-semitone change in the alto voice — while E and G stay in place. The ear barely registers the transition because the interval structure is preserved with minimal motion. This principle explains why certain cross-key transitions feel effortless regardless of what the key labels say.

If Track A contains a chord that is parsimoniously related to a chord in Track B, and the beat alignment puts those chords in phase, the mix will sound smooth even if the two tracks are technically in different keys on the Camelot Wheel. Conversely, two tracks in the same key can clash if their chord voicings don't allow for parsimonious voice leading at the point of transition. This is the theoretical foundation for why harmonic mixing is subtler than "check the number and match or don't." It is also why the Mixed In Key energy flow rules ("move clockwise to raise energy, counter-clockwise to lower energy") are heuristics, not laws — they encode a directional bias on the circle of fifths that sometimes aligns with what the music actually does, and sometimes doesn't.

Modal interchange: the compatibility destroyer hiding in same-key tracks Modal interchange — also called mode mixture — describes the practice of borrowing chords from a parallel mode. In C major, this means borrowing from C minor: the ♭VI (A♭ major), ♭VII (B♭ major), iv (F minor), and ♭III (E♭ major) are all borrowed from C melodic or harmonic minor. These chords are diatonic to C minor but chromatic in C major, and their use creates emotional color that pure diatonic harmony can't achieve. In electronic music, modal interchange is everywhere. A deep house track in C major might lean heavily on ♭VII (B♭ major) chords — a flattened seven borrowed from C mixolydian — giving it a characteristic subdominant color. A trance track in the same key might stay strictly diatonic, or use ♭VI (A♭) for an emotional lift. Both are "in C major.

" Neither will mix cleanly with the other at a moment where one track is on a ♭VII chord and the other isn't — because their harmonic palettes have diverged even though their key labels are identical. This is the hidden failure mode that no mainstream DJ software models: two tracks in the same key can have incompatible harmonic content because one uses borrowed chords the other doesn't. Key compatibility, as currently practiced, is a necessary condition but not a sufficient one. Why key detection fails (and why it's worse than you think) Chroma-based key detection — the algorithm underlying Mixed In Key, rekordbox, Serato, and every other mainstream tool — works by computing a pitch-class histogram over the track's audio, weighting pitch classes by their salience, and comparing the resulting profile against a reference key profile.

The reference profiles most commonly used derive from the work of Carol Krumhansl and Elizabeth Schmuckerkly (1999), who conducted psychological experiments to determine how listeners perceive tonal stability in different key contexts. These profiles — the "Krumhansl-Schmuckerkly profiles" — assign a stability weight to each of the 12 pitch classes within each of the 24 keys. Key detection algorithms match detected chroma histograms against these profiles using correlation or distance metrics. The key with the best match is assigned. The accuracy of this approach on clean, tonal, acoustic music (classical, jazz, singer-songwriter) is genuinely good — typically 85–90% agreement with human annotators. On modern electronic music, it degrades significantly for several reasons: Bass and kick masking: Low-frequency energy dominates the spectral average in bass-heavy tracks.

The kick drum, which contains significant sub-bass content, biases the pitch-class histogram toward the root note of the kick — not the harmonic content of the track. TRAKTOR's key detection is notoriously unreliable on dubstep and drum & bass for exactly this reason. Sidechain compression: Modern electronic production uses aggressive sidechain compression that creates artificial amplitude envelopes, particularly in the low-mid frequencies. This distorts the chroma histogram in ways that have nothing to do with harmonic content. Minimal harmonic content: A two-chord house loop has half the harmonic information of a Beatles song. The algorithm is fitting a 24-key model to a two-dimensional harmonic space — the results are statistically fragile. Non-Western tunings: A=440Hz standardization is a 1939 convention.

Earlier recordings, classical music recorded before the 1960s, and music from traditions that use different reference pitches (Baroque A=415Hz, some world music traditions) will produce systematically wrong key detections because the chroma histogram peaks shift with the tuning reference. Community-sourced accuracy tests across DJ forums consistently show that built-in key detection in rekordbox, Serato, and TRAKTOR achieves 70–80% accuracy on a typical club library — which means one in five tracks has a wrong or misleading key label. Mixed In Key's standalone software reportedly achieves 5–10% higher accuracy through enhanced chroma analysis, but even that leaves a non-trivial error rate for software that's being used as a constraint in a recommendation system.

Spectral profile: the axis key detection ignores In the first editorial, I argued that spectral profile — spectral centroid, spectral contrast, MFCCs — is a primary axis of similarity that BPM and key don't capture. Two tracks at 128 BPM in A minor can feel completely different depending on whether the harmonic energy is concentrated in the bass (dark, warm) or the mids/highs (bright, aggressive). For harmonic mixing specifically, spectral profile adds a second, independent dimension that predicts whether a transition will feel coherent. The practical rules DJs have developed — "don't mix two dark tracks at a drop," "build energy by moving from dark to bright" — are spectral rules, not harmonic ones. They describe timbral compatibility, not key compatibility.

The ideal mixing framework would operate on two axes simultaneously: harmonic compatibility (from chroma analysis) and spectral compatibility (from spectral profile analysis). Neither axis alone is sufficient. Two harmonically compatible tracks can produce a muddy, dark mess at a drop. Two spectrally compatible tracks in conflicting keys will produce audible dissonance. Only when both axes are satisfied does a transition have a high probability of working. What the research actually says Academic research on harmonic similarity in music retrieval substantially predates the Camelot Wheel. The foundational work by Krumhansl and Schmuckerkly (1999) established the empirical basis for key detection profiles through probe-tone rating experiments — listeners rated how well each of 12 pitch classes completes a musical phrase, producing stability profiles for each key.

These profiles became the reference for virtually all chroma-based key detection algorithms. More recent work has moved toward machine learning approaches. ISMIR papers from 2015–2024 show a clear trend: early systems used rule-based chroma matching; later systems use CNNs and transformers trained on annotated datasets (MagnaTagATune, Million Song Dataset, Giantess). The key detection accuracy on clean music has improved substantially. The open problem remains electronic music with the failure modes described above — the genre that most DJs are actually working with. Shiu et al. (2014) proposed a tonality-based similarity metric that uses key profiles rather than raw chroma — addressing the problem that two tracks in different keys can have similar harmonic language if their key profiles (major/minor profile weights) are similar.

This is conceptually closer to what DJs actually care about: not whether the tracks share a root note, but whether their harmonic motion feels related. On the recommendation side, the constraint-aware framing from the first editorial maps directly to what the music information retrieval literature calls context-aware playlist generation . The distinguishing constraint in DJ contexts — temporal contiguity (tracks must be mixable in real-time) — is largely absent from mainstream MIR research, which tends to optimize for retrieval accuracy over playlist coherence. This is the gap that DJ-focused tools like Mixed In Key are filling empirically, not academically. Toward a more honest mixing framework The practical implication of all this is not that DJs should abandon key-based mixing.

The Camelot Wheel works as a first-order approximation — it's just that it's an approximation with documented failure modes that most DJs have learned to navigate intuitively. The goal of laying out the theory is to make those failure modes explicit and navigable rather than mysterious. A more honest mixing framework would

When A=438 — Tuning Reference Drift and the Hidden Crisis in Music Analysis

Sun, 07 Jun 2026 00:00:00 GMT

When A=438 — Tuning Reference Drift and the Hidden Crisis in Music Analysis

The systematic error that makes key detection unreliable for a large fraction of your library.

In 1939, the International Organization for Standardization standardized A=440Hz as the concert pitch reference. Before that, different orchestras and cities used different tuning standards — A=435Hz was common in France, some German orchestras used A=466Hz, Baroque ensembles used A=415Hz. The shift to 440Hz was a negotiation, not a discovery. Today, A=440Hz is so deeply embedded in music technology that it rarely gets questioned. Every tuner, every DAW, every DJ key detection tool assumes A=440Hz as the reference. But a significant portion of music — especially music recorded before the 1970s, and a surprising amount recorded since — was tuned to something other than A=440. This creates a systematic error in every key detection tool that doesn't account for it.

What Tuning Reference Actually Means When a tuner says a recording is in A=440Hz, it's saying that the note A above middle C (A4) vibrates at 440 cycles per second. Every other note's frequency is derived from that reference via equal temperament — A4# (B4) is 440 × 2^(1/12) ≈ 466.16Hz, and so on across all twelve semitones. If a track was recorded with A=438Hz, the actual frequencies are lower by a fraction of a semitone. The "A" in that track is 438Hz, not 440Hz — which means every note in the track is shifted down by about 7.6 cents (a cent is 1/100 of a semitone). This is below the threshold of most tuners and many key detection algorithms to notice, but it compounds: an algorithm that assumes A=440 when analyzing an A=438 recording will report every note as being roughly 7.6 cents flat. At small intervals, 7.6 cents sounds imperceptible to most listeners.

But key detection algorithms work on chroma histograms — they collapse pitch classes across octaves and compare the distribution of pitch energy. A consistent 7.6-cent shift across all notes in a recording will push the algorithm's chroma estimate into adjacent pitch classes at the edges of the detection window, causing systematic misclassification. In practice, this means a track recorded at A=438 that a human ear would confidently identify as being "in A minor" will be reported by most key detection tools as being in either A♭ minor or B♭ minor — because the chroma peaks don't quite align with the expected template for A minor. Why This Matters for DJ Library Analysis The problem is compounded by the fact that DJ key detection happens on processed audio — MP3s, FLACs, WAVs that have been encoded, decoded, and possibly pitch-shifted at some point in the distribution chain.

Even if the original recording was at A=440, sample rate conversion in digital processing can introduce small pitch shifts that accumulate. When you run KeyFinder, Mixed In Key, or any other chroma-based key detector on a large library, you're running it against a mix of recordings with different tuning standards and different processing histories. The results are noisy — not because the algorithm is bad, but because it's applying a single reference frame (A=440) to material that doesn't conform to it. This is why key detection disagreement between tools is so common and so frustrating. Spotify says A minor. Rekordbox says C minor. Mixed In Key says D minor. All three are running chroma-based detection. The difference is the reference pitch and the window length — longer windows are more robust to tuning reference errors but can miss rapid key changes within a track.

The Classical and Jazz Problem The tuning reference problem is particularly severe in classical and jazz recordings. Classical orchestras settled on A=440Hz broadly only in the mid-20th century. Many recordings from the 1950s–1970s used A=442Hz or even A=443Hz — especially German and Austrian orchestras. Baroque recordings frequently use A=415Hz (a half-step below modern pitch), which is the historically appropriate tuning for period instruments. Jazz recordings are all over the place. Small label jazz from the 1950s and 60s often had slightly sharp tuning — A=442Hz was common in some NYC studios. Some modern jazz recordings are tuned to A=438 or even lower, as producers sought a "darker" tonal character that lower tuning references produce.

If you're a DJ mixing across genres — house music with classical samples, or hip-hop with jazz breaks — these tuning reference differences mean that a key detection algorithm running against your sample library will systematically misclassify anything recorded outside of A=440. How Key Detection Actually Works (and Why It Fails Here) Most modern key detection tools use a variation of the High Pitch Class Profile (HPCP) algorithm: Compute the Short-Time Fourier Transform (STFT) of the audio to get a time-frequency representation Map the frequency bins to 360 pitch class bins (30 bins per semitone for precision) Accumulate the energy in each pitch class across the entire track (or a selected region) Compare the resulting chroma vector against a key profile template (major/minor) Return the best-matching key The algorithm assumes A=440Hz when mapping frequency bins to pitch classes.

If the actual tuning reference is different, every frequency-to-pitch mapping is systematically off. The fix — in principle — is to add a tuning estimation step before the chroma analysis. The algorithm estimates the tuning reference by finding the most prominent pitch in the signal and computing its deviation from the nearest semitone at A=440. If the deviation is consistent across the track, it adjusts the reference frame before computing the chroma profile. This is what KeyFinder does, according to its documentation: it estimates the tuning of the input signal before computing the chroma profile. But the estimation window and the tolerance threshold determine how well it handles non-standard tunings — and for recordings that are close to A=440 but not exactly, the algorithm may still produce errors.

The Practical Implication for DJ Library Management If you're building a DJ library with tracks from multiple eras, labels, and regions, a non-trivial percentage of them will have non-standard tuning references. Key detection tools will misclassify some fraction of these, and the error will be systematic — the same direction of misclassification for the same recording source. The practical fixes are: Verify manually — The only reliable method is to check the detected key against a known phrase in the track. If the track has a clear melodic hook or bassline, hum it against a tuner in the key it was recorded in (or use your ear). If the detected key doesn't match the actual key, manually override it. Use longer samples — Most key detection tools let you specify the analysis region.

Longer, more representative sections (30–60 seconds rather than 10–15) produce more stable chroma profiles that are more robust to tuning reference errors. Be skeptical of edge cases — If the algorithm reports a key that's on the edge of the Camelot wheel (B, E♭, A♭ major; G♭, C♭, F# minor), double-check the detection. These are the pitch classes most likely to be misclassified when tuning reference is off. Consider the source — Tracks from small independent labels with lo-fi production are more likely to have non-standard tuning. Classical, jazz, and world music imports are higher-risk. Major label electronic music recorded in professional studios is lower-risk.

The Deeper Problem The tuning reference problem is a symptom of a broader issue in music analysis: the assumption that recorded music is a stable, uniform signal when it actually carries the fingerprints of its recording environment. A recording is not a clean data source. It's a physical artifact — affected by the room it was recorded in, the instruments that produced it, the tape machines and digital converters that captured it, and the processing it's been through since. Key detection algorithms treat it as a clean signal and apply a single reference frame. In reality, each recording has its own reference frame that may not match the standard.

This is why the field of music information retrieval (MIR) is increasingly moving toward learned representations — deep neural networks trained on large labeled datasets — that can implicitly learn tuning reference variation rather than hard-coding a single standard. Tools like Essentia's NNLS key detector use chroma-based features with machine learning to improve robustness to tuning variation. But even these tools aren't perfect. The fundamental challenge is that the "correct" key of a recording is partly a cultural and perceptual judgment, not just a physical measurement. A track recorded at A=438 by a producer who tuned their instruments to A=438 is in the key of A — not A♭. An algorithm that reports A♭ because it assumes A=440 is wrong, even if the physics says so. This is the same argument as "key detection is an opinion" — but with a more specific mechanism.

The opinion isn't just about major vs. minor ambiguity. It's about what reference frame you apply when measuring pitch in the first place. Until music analysis tools become sophisticated enough to estimate and correct for tuning reference on a per-track basis, the best approach is to treat key detection as a starting point, not a ground truth — and to verify with your ears before you build a set around a particular key compatibility hypothesis.