• Angry_Autist (he/him)@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    1 day ago

    Not all LLMs are being trained on API data, it is a lot more resource efficient just to give them a copy of the comment database, which has all records of edits stored

    • sad_detective_man@leminal.space
      link
      fedilink
      English
      arrow-up
      1
      ·
      24 hours ago

      would a LLM operator pay for that or care for unedited data? the redditor in me assumes they would prefer it from the source and are aware of Redact but my better judgement thinks they’re probably just average capitalists with a shiny toy and don’t really care about what it reads. Hence why they are using it on reddit

      • Angry_Autist (he/him)@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        6 hours ago

        reddit was once considered the highest quality user generated data on current topics, and that archived data is valuable even if the last few years are unsellable

        • sad_detective_man@leminal.space
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 hours ago

          perhaps you are right. my recent years on reddit have tainted the whole experience but maybe a LLM owner doesn’t feel the same way and can simply omit the brainrot. I like your username btw. have a nice week