AI Creates Precise Street Images by Analyzing Their Soundtrack

Janani R December 28, 2023 | 1:30 PM Technology

While existing AI systems can generate sound effects to accompany silent images of city streets and other environments, a groundbreaking new technology works in the opposite direction. It creates images that correspond to audio recordings of streets, achieving strikingly accurate results.

Created by Assistant Professor Yuhao Kang and his team at the University of Texas at Austin, the "Soundscape-to-Image Diffusion Model" was trained using a dataset of 10-second audio-visual clips.

Figure 1. AI Creates Precise Street Images

The dataset used for training the model included still images and ambient sounds extracted from YouTube videos of urban and rural streets across North America, Asia, and Europe. By employing deep learning algorithms, the system learned to associate specific sounds with corresponding items within the images, as well as to understand how different sound qualities related to various visual environments. Figure 1 shows AI Creates Precise Street Images.

After completing its training, the system was challenged to generate images solely from the recorded ambient sound of 100 different street-view videos, producing one image for each video.

A panel of human judges was then shown each of the generated images alongside two other street images, while listening to the corresponding video soundtrack. The judges were tasked with identifying which of the three images matched the soundtrack, and they were able to do so with an average accuracy of 80%.

Additionally, when the images were analyzed by a computer, the proportions of open sky, greenery, and buildings in the generated images were found to strongly correlate with those in the original videos.

In many cases, the generated images also accurately reflected the lighting conditions of the source videos, such as sunny, cloudy, or nighttime skies. This could be due to factors like reduced traffic noise at night or the sound of nocturnal insects.

While the technology has potential forensic applications, such as estimating the location of an audio recording, the primary focus of the study is to explore how sound influences our sense of place.

"The results could enhance our understanding of how visual and auditory perceptions impact human mental health, guide urban design for place-making, and improve the overall quality of life in communities," the scientists write in their paper, recently published in Nature.

Source: University of Texas at Austin

Cite this article:

Janani R (2024), AI Creates Precise Street Images by Analyzing Their Soundtrack, AnaTechMaz, pp.290

Recent Post

Blog Archive