WEEK 5 — MUSIC GENRE CLASSIFICATION

tunahanpinar
3 min readMay 22, 2021

--

Hello, this is our fifth blog post for our Fundamentals of Machine Learning course project. This week, we analyzed different features with the different types of models. Let’s begin talking about these features.

Chroma Frequencies: Chroma features are an interesting and powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chroma) of the musical octave. They seem like the picture below.

Rmse: The root mean square provides a normalized measure of the signal energy in a time window.

Spectral Centroid: Basically, it indicates where the center of mass for a sound is located. It is calculated as the weighted mean of the frequencies present in the sound data.

Spectral Bandwidth: The spectral bandwidth is defined as the extent of the power transfer function around the center frequency.

Spectral Rolloff: Spectral rolloff is the frequency at which a specified percentage of the total spectral energy lies.

Zero-Crossing Rate: The zero-crossing rate is the rate of sign-changes in which the signal changes from positive to negative or back along with a signal. This is a commonly used feature in music information retrieval, and also speech recognition.

Mfcc: Mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound. Basically, it is based on a linear cosine transform of a log power spectrum on a nonlinear Mel scale of frequency. The Mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features that collectively make up an MFC. These small sets give us concisely to describe the overall shape of a spectral envelope.

We use 1 column for each feature we described, 20 columns for mfcc, and 1 column for genres label. Then, as we mentioned in previous blog we started training with a neural network model. There are 4 layers in our network. The first one is an input layer. Then there are 2 hidden layers and the last layer is an output layer. The value inside dense represents the dimension of an output space. So the first layer has 256 neurons, the output layer has 10 neurons as we are classifying into 10 genres. We used Relu as an activation function and used adam algorithm which is largely used in deep learning. Our model’s goal is to reduce sparse categorical cross-entropy loss function. We used the fit function with 20 epochs and 128 batch sizes. And with these properties, accuracy with these different features is equal to 73%.

We also tried the same features with different models(logistic regression, linear discriminant analysis, k-neighbors classifier, decision tree classifier, Gaussian Naive Bayes, support vector machine). And our accuracies are shown below.

NEXT WEEK

We can improve accuracies by feature selection and extraction, on a neural network by changing activation function or changing epoch, batch size counts, adding more hidden layer, etc. Next week we tried to improve accuracy with these processes.

WRITERS’ MEDIUM PAGES

Ecem Varma

Tunahan Pınar

Pelin Azizoğlu

--

--

No responses yet