Now Loading

Nvidia’s AI: Learning from a Non-Stop YouTube and Netflix Marathon

Nvidia AI

Leaked internal documents from NVIDIA indicate that the company has been using videos scraped from YouTube, Netflix, and other sources to gather training data for its AI products. It appears NVIDIA was downloading an equivalent of 80 years’ worth of video content daily for this purpose.

Former NVIDIA employees have reported that they were instructed to scrape video content from platforms like Netflix and YouTube to create training data for various AI applications, including NVIDIA’s Omniverse 3D world generator, self-driving car systems, and “digital human” projects. The goal was to build a foundational model similar to Gemini 1.5, GPT-4, or Llama 3.1, which would integrate light simulation, physics, and intelligence to support various critical applications for NVIDIA. According to the leaked information, Project Cosmos allegedly used an open-source video downloader and machine learning techniques to bypass YouTube’s blocking efforts, with discussions about employing up to 30 virtual machines on Amazon Web Services to download videos daily.

NVIDIA denies any wrongdoing, with a spokesperson stating, “We respect the rights of all content creators and are confident that our models and research efforts comply fully with both the letter and the spirit of copyright law.”

Upcoming Conferences