In a recording studio in Seoul, South Korea, the K-pop music label HYBE is using artificial intelligence (AI) technology to combine the voice of a South Korean singer with native speakers of five other languages. This innovative approach allowed HYBE to release a track by singer MIDNATT in six different languages simultaneously, including Korean, English, Spanish, Chinese, Japanese, and Vietnamese. While some K-pop singers have previously released songs in English and Japanese, this marks the first time that a six-language release has been achieved using AI technology. The success of this approach could lead to its adoption by more popular acts in the future.
The process involves recording MIDNATT singing the song “Masquerade” in each language, and then combining his vocals with those of native speakers reading out the lyrics. HYBE’s in-house AI music technology is used to seamlessly unite and enhance the vocals in each language. By dividing the sound into various components such as pronunciation, timbre, pitch, and volume, the technology can create a natural-sounding outcome. For example, an elongated vowel sound was added to the English lyrics to improve naturalness, while the singer’s voice remained unchanged.
Supertone, a company acquired by HYBE for 45 billion won ($36 million), developed the Neural Analysis and Synthesis (NANSY) framework, which powers the deep learning used for the AI technology. This AI-powered approach is more effective in creating a natural sound than non-AI software, according to Choi Hee-doo, Chief Operating Officer of Supertone. HYBE has plans to make some of the AI technology used in MIDNATT’s song accessible to creators and the public, although it is unclear if there will be any associated fees.
MIDNATT expressed that using AI technology has expanded his artistic expression and lifted the language barrier, making it easier for global fans to have an immersive experience with his music. While AI technology in music is not new, it represents an innovative application in the industry. Valerio Velardo, director of The Sound of AI, believes that AI music technology will benefit not only professional musicians but also a wider population, making music creation more accessible in a similar way to how Instagram has made photo sharing accessible.
Currently, HYBE’s pronunciation correction technology takes weeks or months to complete, but as the process becomes faster, it could serve a broader range of purposes such as interpretation in video conferences, according to Choi Jin-woo, the producer of MIDNATT’s “Masquerade,” who is also known as Hitchhiker.