Reddit Signs AI Content Licensing Deal Ahead of IPO::Reddit Inc. has signed a contract allowing a company to train its artificial intelligence models on the social media platform’s content, according to people familiar with the matter, as it nears the potential launch of its long-awaited initial public offering.

  • 800XL@lemmy.world
    link
    fedilink
    English
    arrow-up
    99
    ·
    7 months ago

    Long-awaited, said no one. Is AI going to fabricate even more of the bullshit on reddit then?

    • electricprism@lemmy.ml
      link
      fedilink
      English
      arrow-up
      16
      ·
      7 months ago

      Problem is Reddit content and votes aren’t all human so unless they kept a record of which parts are just chatbots and which votes were faked its not exactly useful to train on in a pure sense.

      Considering the disinformation wars and botnets between the big countries its hard to even get a idea of what people really think and what is bullshit and what isn’t.

      In any case I’m glad reddit has fucked themselves. This small corner of sanity is a bastion in a shit blizzard.

  • BlueÆther@no.lastname.nz
    link
    fedilink
    English
    arrow-up
    43
    ·
    7 months ago

    I’ve been on reddit, I don’t know that I would like to use a LLM trained on much of the content there (excluding tech/DIY space)

    • muntedcrocodile@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      14
      ·
      7 months ago

      Reddit is actually pretty decent for training llms. Funny enough an ai finetuned on 4chan does better in intelegence benchmarks.

        • Lvxferre@mander.xyz
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          4
          ·
          edit-2
          7 months ago

          “Finetuned”, “Intelegence”. Oh the irony.

          Focus on what is being said, not how it is said. The comment is silly but its usage of non-standard spelling has jack shit to do with it, the issue is the content.

            • Lvxferre@mander.xyz
              link
              fedilink
              English
              arrow-up
              8
              arrow-down
              3
              ·
              edit-2
              7 months ago

              No thanks. Im going to go ahead and focus on what I choose. But thanks for your input.

              Translation: “No thanks. I’m going to keep irrationally associating lack of literacy with stupidity, even if both things are orthogonal.”

              That’s the real irony, isn’t it? Actually two instances of irony, as it shows that you have both traits that you’re incorrectly associating together.

              Then, second request: could you please be a dead weight elsewhere? You’ll probably find more suitable company for your lack of basic rationality in Reddit.

        • muntedcrocodile@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          7 months ago

          Your unwarranted fixation on spelling in an online forum blatantly exposes your glaring dearth of insight beyond superficiality, a trait that most likely mirrors the shallowness dwelling within you.

  • General_Effort@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    7 months ago

    They say it’s $60 million on an annualized basis. I wonder who’d pay that, given that you can probably scrape it for free.

    Maybe it’s the AI act in the EU. That might cause trouble in that regard. The US is seeing a lot of rent-seeker PR, too, of course. That might cause some to hedge their bets.

    Maybe some people had not realized that yet, but limiting fair use does not just benefit the traditional media corporations but also the likes of Reddit, Facebook, Apple, etc. Making “robots.txt” legally binding would only benefit the tech companies.

  • deadlyduplicate@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    7 months ago

    Hmmm anyone remember when Andrew Yang was running for president and said that data was the new oil and that people should own the content they put on social media?

  • HeavyDogFeet@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    7 months ago

    I deleted all my posts before closing my accounts back when they were breaking third-party apps, although I’m sure they probably kept a private log of all posts specifically for this purpose.

    To be honest, I expect AI companies are scraping Lemmy and other places for training data anyway, but I’d rather Reddit specifically not make any money off my posts.

    • Grimy@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      7 months ago

      They keep a history of all your comments and edits. Deleting them will work for the companies that are scraping it for free though, but it also brings up the value of Reddit’s private database.

    • Brewchin@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 months ago

      I’ve been using Power Delete Suite for years. It runs as a browser bookmark, so doesn’t need API, etc. I’ve got it deleting everything older than 3 months each time I run it.

  • kandoh@reddthat.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 months ago

    I made enough Reddit comments that they could probably make a solid imitation of me