Nvidia Introduces AI Model Fugatto: Revolutionizing Audio and Music Creation with Generative Technology
Nvidia unveiled a new artificial intelligence model on Monday designed to generate music and audio, capable of modifying voices and creating unique sounds. The technology, called Fugatto (short for Foundational Generative Audio Transformer Opus 1), is aimed at music producers, filmmakers, and video game developers. While Nvidia, the world’s largest chipmaker and AI software supplier, has no immediate plans to release the model publicly, it showcases advancements similar to technologies from startups like Runway and major players like Meta, which can generate audio and video from text prompts.
Based in Santa Clara, California, Nvidia’s Fugatto can generate sound effects and music based on textual descriptions, including creating unusual sounds such as making a trumpet bark like a dog. What sets Fugatto apart is its ability to modify existing audio, such as transforming a piano melody into a human voice or altering the accent and mood of a spoken word recording.
Bryan Catanzaro, Nvidia’s VP of applied deep learning research, highlighted that generative AI could revolutionize music, video games, and personal creativity, much like how synthesizers changed music production decades ago. However, the technology raises concerns about potential misuse, such as generating misinformation or violating copyright laws.
While companies like OpenAI are discussing AI’s role in Hollywood, tensions have arisen, particularly after accusations from actress Scarlett Johansson that OpenAI mimicked her voice. Nvidia’s model was trained on open-source data, and the company is still weighing its options for public release, mindful of the risks associated with generative technologies.