CNN for music genres

A CNN for music genre classification and TSC in general

View the Project on GitHub RobRomijnders/cnn_music

CNN for music genres

This post presents a CNN for music genre classification. Over the last weeks, I got many positive reactions for my implementations of a CNN and LSTM for time-series classification. With this post, we stretch the TSC domain to long signals. Music has a typical sample frequency of 44.1 kHz. For basic classification, you need at least 1 or 2 seconds of data. That implies a signal length of roughly 50.000 samples. This post implements a CNN with accuracies around 90%.

Downsampling architecture

The implementation downsamples in two stages. At first, extract_music.m loads the data and applies a hard-coded downsample with factor 30. The resulting sample frequency is 1470Hz. The second downsampling occurs after the first conv-layer in CNN_music_main.py. An immediate downsampling with factor 90 would discard useful information. The first conv-layer in the CNN graph can extract useful information. The strided max-pooling condenses information with a factor 3. With a combined convolution and max-pooling, we allow for valuable information to be passed to the second conv-layer

Your own use

The goal is to show that even long sparse signals, like music, also allow for time-series classification. In this project, the MATLAB code in extract_music.m extract chunks of signal from the music in a specific directory. You can cut down the project at every layer for your own use

Output

With the implementation set in the .py file, you can expect these outputs: The evolution of the loss-function and the accuracies. Loss_acc The graph in TensorBoard graph

As always, I am curious to any comments and questions. Reach me at romijndersrob@gmail.com