OpenAI Transcribes Over A Million YouTube Hours: Navigating The Gray Area Of AI Data Use


20-Year Pro Trader Reveals His "MoneyLine"

Ditch your indicators and use the "MoneyLine". A simple line tells you when to buy and sell without the guesswork. It’s a line on a chart that’s helped Nic Chahine win 83% of his options buys. Here's how he does it.


OpenAI developed its Whisper audio transcription model, which was reportedly used to transcribe over a million hours of Alphabet Inc‘s (NASDAQ:GOOGL)(NASDAQ:GOOG) YouTube videos to train GPT-4.

The initiative, described as a means to navigate the challenge of limited training data availability, stirred discussions around the legality and ethics of such data acquisition practices, The New York Times reported.

See Also: Sam Altman’s $7 Trillion AI Ambition: Is OpenAI’s CEO Stretching Too Far? Expert Weighs In

The newspaper highlighted OpenAI was aware of the legal uncertainties surrounding this method but considered it to fall within the boundaries of fair use. Greg Brockman, president of OpenAI, was notably involved in the selection process of videos for transcription.

Responding to inquiries, an OpenAI spokesperson, Lindsay Held, communicated to The Verge that OpenAI constructs “unique” datasets for its models to enhance their “understanding of the world” while maintaining a competitive stance in global research.

Held mentioned OpenAI’s approach to data gathering spanned various methods, including the utilization of publicly available data, partnerships for access to non-public data and exploration into generating synthetic data.

This development came amid growing concerns within the AI industry over the availability of quality training data.

The Wall Street Journal reported earlier a potential looming crisis where AI companies could exhaust new content sources by 2028, suggesting alternatives such as synthetic data creation or curriculum learning as possible solutions.

The practice of using extensive internet content, including YouTube videos, without explicit permission, has led to multiple legal and ethical debates emphasizing the precarious balance AI developers must navigate between innovation and copyright compliance.

Read Next: YouTube CEO Unsure, But Warns ‘Clear Violation’ If OpenAI Used Creators’ ‘Hard Work’ To Train Sora

Photos: Shutterstock


20-Year Pro Trader Reveals His "MoneyLine"

Ditch your indicators and use the "MoneyLine". A simple line tells you when to buy and sell without the guesswork. It’s a line on a chart that’s helped Nic Chahine win 83% of his options buys. Here's how he does it.


ENTER TO WIN $500 IN STOCK OR CRYPTO

Enter your email and you'll also get Benzinga's ultimate morning update AND a free $30 gift card and more!

Posted In: NewsTechAIartificial IntegllienceConsumer TechDataGPT-4OpenAiYouTube