Building the Saylists Algorithm

In 2021 I worked on a project called Saylists, built in partnership with Warner Music and Apple Music. The idea was to use music as a tool in speech therapy. Specifically, to find songs whose lyrics contain the sounds a patient needs to practice.

The output was a playlist generator. We were given a target sound by a speech therapist, let’s say the letter S, and the alghorithm returns a ranked list of songs whose lyrics are densely packed with that sound in the right patterns. The patient practices the therapy exercise alongside a song they might actually enjoy, which makes repetition less of a chore.

The interesting part was the algorithm underneath.

Why repetition matters in speech therapy

Speech therapy for certain conditions, stuttering, articulation disorders, post-stroke recovery, often relies on repetitive exposure to specific sounds. The goal is to build muscle memory and improve the motor patterns involved in producing those sounds correctly.

Repetition is effective but it’s also boring. Music is a different medium. The rhythm, melody, and emotional engagement of a song can make repetition feel less like a chore. The hypothesis behind Saylists was that if we could identify songs that naturally contain high densities of the right phonetic patterns, we could make that repetition more sustainable.

The tricky part was defining “the right patterns” in a way that was both clinically valid and computationally tractable.

Getting the criteria right

Before writing any code, we worked with professional speech therapists to understand what they were actually looking for in a useful song.

It turned out that not all instances of a sound are equal. A sound at the start of a word (the onset position) behaves differently from the same sound in the middle or at the end. Double letters (like the “ss” in “success”) represent a different kind of repetition again. Therapists have specific views on which of these are most therapeutically valuable, and those views vary depending on what the patient is working on.

What we ended up with was a set of weighted categories:

Start of word — the sound appears at the beginning
End of word — the sound appears at the end (excluding double-end cases)
Double letter — the sound appears twice consecutively
Orphan sounds — instances that don’t fit the other patterns

Each category contributes a base score multiplier. Double letters score higher because they represent a more concentrated form of practice.

Once we had scored songs against those criteria, we brought the results back to the therapists who could tell us if our alghorithm was heading in the right direction. That feedback loop shaped several rounds of changes to the weighting.

How the algorithm works

The algorithm runs in three phases.

Phase 1 — Sound detection

For a given song and target sound, the algorithm reads through every word in the lyrics and checks each word against four regular expressions: one for start-of-word matches, one for end-of-word matches, one for double letters, and one for everything else. Each match adds to a loopScore using the pattern’s weight multiplied by the square of the word’s position index.

Words that contain at least one match are pushed into a positionArray with their index, their loopScore, and a placeholder for distance.

Phase 2 — Distance calculation

Once the full list of matching words is built, the algorithm walks through adjacent pairs and calculates the gap between them. This is measured in word positions. How far apart are these two instances of the target sound in the lyric?

This distance is what determines how useful those two instances are in combination. A song that repeats the target sound every few words is more valuable than one where the same sound appears twice but fifty words apart.

Phase 3 — Inverse-square scoring

The final score for each matched word is calculated using the inverse-square law. If the distance between two instances is d, then the boost applied to the score is 1 / d².

Short distances produce a large boost. Long distances produce almost none. The total score for a song is the sum of these boosted word scores across the entire lyric.

This felt like the right model intuitively. In speech therapy, the benefit of close repetition is not just additive, it’s compounding. Practicing a sound twice in quick succession is more valuable than practicing it twice with a long gap in between. The inverse-square law captures that relationship without requiring us to make it up manually.

The scale problem

Songs are short. Lyrics are not a large data structure. The algorithm itself is fast.

The problem was that we were running it against over 500,000 songs.

Each song needed to be scored for multiple target sounds. The naive approach, score everything upfront and store the results, was too slow at first and required careful thinking about how to structure the data pipeline to make batch processing viable.

We built the system to scale horizontally so that scoring jobs could be distributed across multiple workers. The scoring itself is stateless and embarrassingly parallel: each song can be scored independently, which makes parallelisation straightforward in principle. The harder problem was managing the queue, handling failures gracefully, and not re-scoring songs that hadn’t changed.

What I took from it

The most interesting part of this project wasn’t the code. It was the collaboration with the speech therapists.

I came in thinking the algorithm was the hard bit. It turned out the hard bit was understanding the domain well enough to know what the algorithm should be measuring. The therapists didn’t think in terms of regex patterns or inverse-square functions. They thought in terms of which exercises were effective and why. Translating that clinical knowledge into something computable took real back-and-forth.

The algorithm we ended up with is not complex. It’s a few hundred lines of fairly straightforward logic. But it took multiple rounds of rethinking to get to something that the therapists felt reflected what they knew about how speech therapy works.

That’s often how it goes. The code is rarely the hard part.

Why repetition matters in speech therapy

Getting the criteria right

How the algorithm works

The scale problem

What I took from it

Related Projects

Saylists