matilde matilde on hover

album cover generation from audio

[python,js,html,css]

Final project for the Postgraduate Course in AI with Deep Learning
completed in collaboration with David Solano and Jordi Martínez
Advisor: Txus Bach

This project focuses on generating album covers from audio data using advanced AI and deep learning techniques. Leveraging a large dataset of over 23,000 audio files and corresponding album covers from Spotify, we aimed to merge music and visual art by creating personalized and aesthetically appealing album covers.

Approach and Model Development

We developed several models, starting with a basic DCGAN and progressing to a conditional DCGAN (cDCGAN) that utilized genre labels and complex prompts, allowing the generation of covers tailored to specific music attributes. The most successful approach was using DreamBooth-Stable Diffusion, which significantly enhanced the quality and creativity of the generated images while optimizing computational efficiency.

Deployment and Results

The final system was deployed using Flask and Docker, with GPU-powered model inference for efficiency. Users can interact with the platform by uploading songs and metadata, allowing the system to generate custom album covers based on extracted music features using the "musicnn" library.

This project led us to discover new ways to connect audio to images using AI. It was fascinating to see how the AI captured the essence of the music and transformed it into matching visuals.

Here are some of the outputs:

cover 1 cover 2

sourcecode