

I’ve had a pipeline in mind for exactly this purpose that I want to build when I get around to it:
- Download the audio file from RSS feed
- Self hosted AI transcription model (with output that includes timestamps)
- Self hosted LLM to recognise ad sections and return the start and end timestamps as json
- ffmpeg to slice those timestamps out and stitch the rest back together
In theory, this should be able to remove ad and sponsor sections of any length completely automatically and there’s nothing to stop it working on videos too
This is the way.
Zen’s UI is great, and it has better defaults for privacy than Firefox does, as well as having its own mods repository on top of all the standard Firefox webextensions. (Try “better find bar”, “floating history” and “floating status bar” if you haven’t already, they make the last few bits of the UI look consistent)