Federico Viticci and John Voorhees of MacStories have released an open letter to EU and US lawmakers and regulators with their concerns over the way that most Large Language Models have been trained. In brief, they point out an obvious and undeniable truth: That any model that has been trained on the open web using in-copyright text is intellectual property theft.
What they have written is self-evidently true and it is time for those who can act to act.
Recent comments by Mustafa Suleyman, Microsoft’s CEO for AI demonstrate that companies, in their rush to get to market, have not even begun to think through the implications of what they are doing.
While I remain hopeful for what AI will be able to do in the future, it is clear that we have a lot of work to do in the present.
There is no question in my mind that those who have scraped copyrighted material need to either license that material or rebuild their models from the ground up based on licensed and out of copyright material.
The smartest companies will get out ahead of this. The least ethical will try to weather the storm or find a buyer and walk away before it hits.