• 0 Posts
  • 218 Comments
Joined 4 months ago
cake
Cake day: February 13th, 2025

help-circle
  • An RSS file is a plain text, computer readable file that you add to your website, containing a list of all recent posts that you want to promote.

    Anytime I add a post to my blog, I update my RSS file. (Well, a piece of automation does, I could hand edit it, but I’m lla lazy programmer.) Then a service I registered with shares any new posts (posts with today’s date) to services line Mastodon or Lemmy through bot accounts that I set up.

    People can also subscribe directly to the RSS feed (file), using various news reading apps. (But I think following RSS through Mastodon and Lemmy bots is becoming more popular, lately?)

    You can learn a lot more about the RSS through the RSS Specification, but you may not need to.

    I find that WordPress and other blog solutions mostly just make good default assumptions whenever I have turned on the RSS feature or plugin.

















  • When detecting duplicates gets expensive, the secret is to process them anyway, but in a way that de-duplicates the result of processing them.

    Usually, that means writing the next processing step into a (new) table whose primary key contains every detail that could make a record a duplicate.

    Then, as all the records are processed, just let it overwrite that same record with each duplicate.

    The resulting table is a list of keys containing no duplicates.

    (Tip: This can be a good process to run overnight.)

    (Tip: be sure the job also marks each original record as processed/deduped, so the overnight job only ever has to look at new unprocessed/un-deduped records.)

    Then, we drive all future processing steps from that new de-duplicated table, joining back to only whichever of the duplicate records was processed last for the other record details. (Since they’re duplicates anyway, we don’t care which one wins, as long as only one does.)

    This tends to result in a first single pass through the full data to process to create the de-duplicated list, and then a second pass through the de-duplicated list for all remaining steps. So roughly 2n processing time.

    (But the first n can be a long running background job, and the second n can be optimized by indexes supporting the needs of each future processing step.)