Table of Contents




I. Vatolkin, P. Ginsel, G. Rudolph: Advancements in the Music Information Retrieval Framework AMUSE over the Last Decade. Accepted for Proc. SIGIR

AMUSE (Advanced MUSic Explorer) was created 2006 as an open-source Java framework for various music information retrieval tasks like feature extraction, feature processing, classification, and evaluation. In contrast to toolboxes which focus on individual MIR-related algorithms, it is possible with AMUSE, for instance, to extract features with Librosa, process them based on events estimated by MIRtoolbox, classify with WEKA or Keras, and validate the models with own classification performance measures. We present several substantial contributions to AMUSE since its first presentation at ISMIR 2010. They include the annotation editor for single and multiple tracks, the support of multi-label and multi-class classification, and new plugins which operate with Keras, Librosa, and Sonic Annotator. Other integrated methods are the structural complexity processing, chord vector feature, aggregation of features around estimated onset events, and evaluation of time event extractors. Further advancements are a more flexible feature extraction with different parameters like frame sizes, possibility to integrate additional tasks beyond algorithms related to supervised classification, marking of features which can be ignored for a classification task, extension of algorithm parameters with external code (e.g., a structure of a Keras neural net), etc.


I. Vatolkin, F. Ostermann, M. Müller: An Evolutionary Multi-Objective Feature Selection Approach for Detecting Music Segment Boundaries of Specific Types. Accepted for Proc. GECCO

The goal of music segmentation is to identify boundaries between parts of music pieces which are perceived as entities. Segment boundaries often go along with a change in musical properties including instrumentation, key, and tempo (or a combination thereof). One can consider different types (or classes) of boundaries according to these musical properties. In contrast to existing datasets with missing specifications which of changes apply for which annotated boundaries, we have created a set of artificial music tracks with precise annotations for boundaries of different types. This allows for a profound analysis and interpretation of annotated and predicted boundaries and a more exhaustive comparison of different segmentation algorithms. For this scenario, we formulate a novel multi-objective optimisation task that identifies boundaries of only a specific type. The optimisation is conducted by means of evolutionary multi-objective feature selection and a novelty-based segmentation approach. Furthermore, we provide lists of audio features from non-dominated fronts which most significantly contribute to the estimation of given boundaries (the first objective) and most significantly reduce the performance of the prediction of other boundaries (the second objective).


I. Vatolkin, B. Adrian, J. Kuzmic: A Fusion of Deep and Shallow Learning to Predict Genres Based on Instrument and Timbre Features. Accepted for Proc. EvoMUSART

Deep neural networks have recently received a lot of attention and have very successfully contributed to many music classification tasks. However, they have also drawbacks compared to the traditional methods: a very high number of parameters, a decreased performance for small training sets, lack of model interpretability, long training time, and hence a larger environmental impact with regard to computing resources. Therefore, it can still be a better choice to apply shallow classifiers for a particular application scenario with specific evaluation criteria, like the size of the training set or a required interpretability of models. In this work, we propose an approach based on both deep and shallow classifiers for music genre classification: The convolutional neural networks are trained once to predict instruments, and their outputs are used as features to predict music genres with a shallow classifier. The results show that the individual performance of such descriptors is comparable to other instrument-related features and they are even better for more than half of 19 genre categories.


I. Vatolkin, M. Koch, M. Müller: A Multi-Objective Evolutionary Approach to Identify Relevant Audio Features for Music Segmentation. Accepted for Proc. EvoMUSART

The goal of automatic music segmentation is to calculate boundaries between musical parts or sections that are perceived as semantic entities. Such sections are often characterized by specific musical properties such as instrumentation, dynamics, tempo, or rhythm. Recent data-driven approaches often phrase music segmentation as a binary classification problem, where musical cues for identifying boundaries are learned implicitly. Complementary to such methods, we present in this paper an approach for identifying relevant audio features that explain the presence of musical boundaries. In particular, we describe a multi-objective evolutionary feature selection strategy, which simultaneously optimizes two objectives. In a first setting, we reduce the number of features while maximizing an F-measure. In a second setting, we jointly maximize precision and recall values. Furthermore, we present extensive experiments based on six different feature sets covering different musical aspects. We show that feature selection allows for reducing the overall dimensionality while increasing the segmentation quality compared to full feature sets, with timbre-related features performing best.



I. Vatolkin, D. Stoller: Evolutionary Multi-Objective Training Set Selection of Data Instances and Augmentations for Vocal Detection. Proc. EvoMUSART

The size of publicly available music data sets has grown significantly in recent years, which allows training better classification models. However, training on large data sets is time-intensive and cumbersome, and some training instances might be unrepresentative and thus hurt classification performance regardless of the used model. On the other hand, it is often beneficial to extend the original training data with augmentations, but only if they are carefully chosen. Therefore, identifying a ``smart'' selection of training instances should improve performance. In this paper, we introduce a novel, multi-objective framework for training set selection with the target to simultaneously minimise the number of training instances and the classification error. Experimentally, we apply our method to vocal activity detection on a multi-track database extended with various audio augmentations for accompaniment and vocals. Results show that our approach is very effective at reducing classification error on a separate validation set, and that the resulting training set selections either reduce classification error or require only a small fraction of training instances for comparable performance.



I. Vatolkin, G. Rudolph: Comparison of Audio Features for Recognition of Western and Ethnic Instruments in Polyphonic Mixtures. Proc. ISMIR

Studies on instrument recognition are almost always restricted to either Western or ethnic music. Only little work has been done to compare both musical worlds. In this paper, we analyse the performance of various audio features for recognition of Western and ethnic instruments in chords. The feature selection is done with the help of a minimum redundancy - maximum relevance strategy and a multi-objective evolutionary algorithm. We compare the features found to be the best for individual categories and propose a novel strategy based on non-dominated sorting to evaluate and select trade-off features which may contribute as best as possible to the recognition of individual and all instruments.

Last modified: 2021-04-25 21:03 by Igor Vatolkin