At a Senate hearing on AI’s impact on journalism, lawmakers backed media industry calls to make OpenAI and other tech companies pay to license news articles and other data used to train algorithms.

  • Grimy@lemmy.world
    link
    fedilink
    English
    arrow-up
    58
    arrow-down
    5
    ·
    edit-2
    10 months ago

    “What would that even look like?” asks Sarah Kreps, who directs the Tech Policy Institute at Cornell University. “Requiring licensing data will be impractical, favor the big firms like OpenAI and Microsoft that have the resources to pay for these licenses, and create enormous costs for startup AI firms that could diversify the marketplace and guard against hegemonic domination and potential antitrust behavior of the big firms.”

    As our economy becomes more and more driven by AI, legislation like this will guarantee Microsoft and Google get to own it.

    • Motavader@lemmy.world
      link
      fedilink
      English
      arrow-up
      29
      arrow-down
      1
      ·
      edit-2
      10 months ago

      Yes, and they’ll use legislation to pull up the ladder behind them. It’s a form of Regulatory Capture, and it will absolutely lock out small players.

      But there are open source AI training datasets, but the question is whether LLMs can be trained as accurately with them.

      • General_Effort@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        10 months ago

        These open datasets are used to fine-tune LLMs for specific tasks. But first, LLMS have to learn the basics by being trained on vast amounts of text. At present, there is no chance to do that with open source.

        If fair use is cut down, you can forget about it. It would arguably be unconstitutional, though.

        That’s not even considering the dystopian wishes to expand copyright even further. Some people demand that the model owner should also own the output. Well, some of these open datasets are made with LLMs like ChatGPT.

        • wewbull@iusearchlinux.fyi
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          10 months ago

          If fair use is cut down…

          It’s not a case of cutting down fair use. It’s a case 9f enforcing current fair use limits.

          • General_Effort@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            1
            ·
            10 months ago

            Can you give an example of something that is outside fair use?

            Just in case, there is confusion here: Obviously there is no past precedent on exactly the new circumstances, but that does not put new technologies outside the law. EG the freedom of speech and the press apply to the internet, even though there is no printing press involved.

      • Dran@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        6
        ·
        10 months ago

        They’re going to get fucked either way, may as well live in the world where smaller AI companies have a chance. It’s already bad enough that openai got to slurp reddit and twitter for free and nobody else can.

        • burliman@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          10
          ·
          10 months ago

          They won’t be fucked. They can use the AI tools as well to make novel content, and augment their production quality and quantity.

    • TORFdot0@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      4
      ·
      10 months ago

      And what about the authors whose works were injected without compensation? What should we do for them? I don’t think that these commercial AI models should get to infringe on their copyrights for nothing. If I pay for a ChatGPT subscription and ask it to tell me about the war the Middle East and it basically regurgitates and plagiarizes information it learned from a journalist, then ChatGPT has essentially stolen the copyrighted work from that journalist and the revenue that my click would have earned them.

      I don’t see a problem using publicly posted copyrighted data for non-commercial use for training local language models but don’t think its fair to allow copyright infringement for commercial use.

      • General_Effort@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        2
        ·
        10 months ago

        You’re repeating some talking points which are simply misinformation. An author who makes something “for hire”, like an employed journalist, does not own the copyright. Do you believe that construction workers benefit when rents go up?

        Copyrights are called intellectual property, because they work a lot like physical property. Employees create them and employers own them. They are bought and sold. A disproportionate share of property belongs to rich people, which is how they are rich.

        This is about funneling more wealth to property owners. The idea that this would benefit anyone else is simply the good old trickle-down. It will not happen.

      • Grimy@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        edit-2
        10 months ago

        I think it’s better be pragmatic then to give everything to the big corporations.

        OpenAi isn’t going to takes its tool offline so the loss of revenue isn’t going away. Payments won’t end up in the pockets of any individual journalist. The money the few journalistic sites will receive will be used to pay for the subscription fee to the next big model while cutting off their staff since it will net them more money.

        If this goes through, Google and Microsoft will spend the next few years buying data or the companies that have it. The walls will be raised and we will be fucked, legislation will only help them.

        And there is simply not enough public domain data to build a competitive product. Better to tax and redistribute through UBI while keeping the field competitive and avoiding monopolies imo.