Menu

Categories:

Hot right now:

Follow on:

Coinsurges provides coverage of fintech, blockchain, and Bitcoin, delivering the most recent news and analyses on the future of money. Stay up-to-date with live prices, charts, and trading options for the top exchanges. Keep track of the day's top cryptocurrency gainers and losers, as well as which coins have experienced gains and losses in the past 24 hours.
Trust Coinsurges as your go-to source for all news and updates in the industry.

Menu

Categories:

Hot right now:

Follow on:

Coinsurges provides coverage of fintech, blockchain, and Bitcoin, delivering the most recent news and analyses on the future of money. Stay up-to-date with live prices, charts, and trading options for the top exchanges. Keep track of the day's top cryptocurrency gainers and losers, as well as which coins have experienced gains and losses in the past 24 hours.
Trust Coinsurges as your go-to source for all news and updates in the industry.

AI training dataset used by tech giants allegedly created by scraping YouTube videos in violation of terms

Share This Post

Non-profit AI research group EleutherAI scraped YouTube subtitles to create a dataset in violation of YouTube’s terms of service, ProofNews said on July 16.

The dataset, called the Pile, allegedly includes subtitles of 173,536 YouTube videos from over 48,000 channels. About 12,000 deleted videos are part of the dataset.

Several top tech and AI firms, including Anthropic, have since used the Pile for training. Anthropic spokesperson Jennifer Martinez said the dataset includes “a very small subset of YouTube subtitles” but declined to comment on possible violations of YouTube’s terms of service.

Business software firm Salesforce also used the dataset. Salesforce VP of AI research Caiming Xiong said the dataset was “publicly available” and that Salesforce used it for academic and research purposes. ProofNews said Salesforce eventually released the same dataset publicly.

Apple used the Pile to train OpenELM, an efficient language model for on-device AI. Nvidia, Bloomberg, and Databricks also used the Pile for AI training.

ProofNews said its list of companies that used the dataset is not comprehensive, as companies do not always disclose which datasets they use in AI training.

Dataset contains crypto channels, more

ProofNews’ search tool indicates that Pile includes videos from crypto channels and creators, including Coinbase, Cointelegraph, Bitcoin Magazine, BitBoy Crypto, 99Bitcoins, Ivan On Tech, and Andreas Antonopolous.

ProofNews highlighted that the dataset includes transcripts from major news channels, education channels, late-night shows, popular YouTube hosts, and other categories. The Pile dataset extends beyond YouTube to other websites and online content.

ProofNews noted an earlier report from the New York Times, which said OpenAI and Google had previously harvested YouTube text. Google, which owns YouTube, said the action was permissible due to its agreement with users. OpenAI did not confirm or deny the report.

AI copyright disputes are far-reaching. Law firm Baker Hoestler lists at least fifteen lawsuits involving tech firms such as Anthropic, Meta, GitHub, Stability AI, Nvidia, and Google. OpenAI faces high-profile lawsuits from Mother Jones’ parent company and The New York Times.

The post AI training dataset used by tech giants allegedly created by scraping YouTube videos in violation of terms appeared first on CryptoSlate.

Read Entire Article
spot_img
- Advertisement -spot_img

Related Posts

Bullish Candle Formation Suggests The XRP Price Could Touch $22

Crypto analyst Egrag Crypto has highlighted a bullish candle formation, which could send the XRP price to as high as $22 This comes just as the analyst predicted that the altcoin could hit a $15

Coinbase says FTX repayments could become a $5B market injection

Coinbase analysts believe that the $5 billion repayment round initiated by the FTX Recovery Trust could function as a significant injection of liquidity into digital asset markets, potentially

Caricom Bloc Advances Pilot to Reduce Dollar Dependence

The governors of the Central Banks of Caricom agreed to advance a local currency-based, instant settlement payment platform to reduce their reliance on the US dollar A pilot has already managed to

Meta shareholders reject Bitcoin treasury bid in landslide vote

Meta Platforms shareholders rejected a measure that urged the company to add Bitcoin (BTC) to its $72 billion cash pile, voting 498 billion shares against and 392 million for at the May 30 annual

Landmark Crypto Legislation Introduced in Congress—CLARITY Act Ushers in Golden Age

The CLARITY Act unleashes a historic wave of pro-crypto momentum, cementing US leadership in Web3, safeguarding self-custody, and supercharging digital asset innovation with unstoppable force

The Bitcoin Chart Wall Street Doesn’t Want You To See

A single data series is up-ending decades of portfolio theory On 29 May, André Dragosch, PhD, European head of research at Bitwise, posted a chart showing that the 60-day rolling correlation between
You have not selected any currencies to display