Catalogue Search | MBRL

A machine learning study highlighting the challenges of fidgety movement recognition using vision and inertial sensors

by Hölzl, Hannes , Mansow-Model, Sebastian , Stein, Anne in 631/114/1305 , 692/617/375 , 692/700/1720/3187

2026

Past medical research has shown that infantile movement and early neurological development are closely linked. Fidgety Movements that are reflex-like movement occurring in healthy infants less than 20-week of age have proven to be especially important, as past studies have highlighted that their absence is strongly correlated with the future development of neurological disorders like Cerebral Palsy. To provide a timely intervention, the General Movement Assessment was proposed as a screening medical procedure carried out by clinical personnel specifically trained to recognize Fidgety Movements. Because of its high cost in time and resources, several initiatives to automatize General Movement Assessment using machine learning techniques have been proposed in the literature. However none has managed to emerge as state-of-the-art so far. To investigate this problem, we conducted a study using deep learning approaches to learn disentangled feature representations for the recognition of Fidgety Movements using RGB-D video and Inertial Measurement Unit data acquired from 95 infants (average age: weeks). Our results show that while it is possible to learn features that characterize movement independently of subject information, obtaining feature representations that consistently generalize to subjects unseen during training remains challenging. More specifically, we observe that both the vision- and sensor-based modalities have specific challenges to be addressed for the recognition of Fidgety Movements. We discuss them and provide recommendations to help researchers interested in investigating this problem in the future.

Journal Article

Share this book

Add to My Shelf

Equilibrium Model with Anisotropy for Model-Based Reconstruction in Magnetic Particle Imaging

by Knopp, Tobias , Droigk, Christine , Mertins, Alfred in Anisotropy , Bessel functions , Computation

2024

Magnetic particle imaging is a tracer-based tomographic imaging technique that allows the concentration of magnetic nanoparticles to be determined with high spatio-temporal resolution. To reconstruct an image of the tracer concentration, the magnetization dynamics of the particles must be accurately modeled. A popular ensemble model is based on solving the Fokker-Plank equation, taking into account either Brownian or Néel dynamics. The disadvantage of this model is that it is computationally expensive due to an underlying stiff differential equation. A simplified model is the equilibrium model, which can be evaluated directly but in most relevant cases it suffers from a non-negligible modeling error. In the present work, we investigate an extended version of the equilibrium model that can account for particle anisotropy. We show that this model can be expressed as a series of Bessel functions, which can be truncated based on a predefined accuracy, leading to very short computation times, which are about three orders of magnitude lower than equivalent Fokker-Planck computation times. We investigate the accuracy of the model for 2D Lissajous magnetic particle imaging sequences and show that the difference between the Fokker-Planck and the equilibrium model with anisotropy is sufficiently small so that the latter model can be used for image reconstruction on experimental data with only marginal loss of image quality, even compared to a system matrix-based reconstruction.

Paper

Share this book

Add to My Shelf

Scattering Transform for Auditory Attention Decoding

by Mertins, Alfred , Pallenberg, René , Maass, Marco in Attention , Classification , Datasets

2026

The use of hearing aids will increase in the coming years due to demographic change. One open problem that remains to be solved by a new generation of hearing aids is the cocktail party problem. A possible solution is electroencephalography-based auditory attention decoding. This has been the subject of several studies in recent years, which have in common that they use the same preprocessing methods in most cases. In this work, in order to achieve an advantage, the use of a scattering transform is proposed as an alternative to these preprocessing methods. The two-layer scattering transform is compared with a regular filterbank, the synchrosqueezing short-time Fourier transform and the common preprocessing. To demonstrate the performance, the known and the proposed preprocessing methods are compared for different classification tasks on two widely used datasets, provided by the KU Leuven (KUL) and the Technical University of Denmark (DTU). Both established and new neural-network-based models, CNNs, LSTMs, and recent Transformer/graph-based models are used for classification. Various evaluation strategies were compared, with a focus on the task of classifying speakers who are unknown from the training. We show that the two-layer scattering transform can significantly improve the performance for subject-related conditions, especially on the KUL dataset. However, on the DTU dataset, this only applies to some of the models, or when larger amounts of training data are provided, as in 10-fold cross-validation. This suggests that the scattering transform is capable of extracting additional relevant information.

Paper

Share this book

Add to My Shelf

Efficient Chebyshev Reconstruction for the Anisotropic Equilibrium Model in Magnetic Particle Imaging

by Knopp, Tobias , Droigk, Christine , Daniel Hernández Durán in Accuracy , Anisotropy , Chebyshev approximation

2025

Magnetic Particle Imaging (MPI) is a tomographic imaging modality capable of real-time, high-sensitivity mapping of superparamagnetic iron oxide nanoparticles. Model-based image reconstruction provides an alternative to conventional methods that rely on a measured system matrix, eliminating the need for laborious calibration measurements. Nevertheless, model-based approaches must account for the complexities of the imaging chain to maintain high image quality. A recently proposed direct reconstruction method leverages weighted Chebyshev polynomials in the frequency domain, removing the need for a simulated system matrix. However, the underlying model neglects key physical effects, such as nanoparticle anisotropy, leading to distortions in reconstructed images. To mitigate these artifacts, an adapted direct Chebyshev reconstruction (DCR) method incorporates a spatially variant deconvolution step, significantly improving reconstruction accuracy at the cost of increased computational demands. In this work, we evaluate the adapted DCR on six experimental phantoms, demonstrating enhanced reconstruction quality in real measurements and achieving image fidelity comparable to or exceeding that of simulated system matrix reconstruction. Furthermore, we introduce an efficient approximation for the spatially variable deconvolution, reducing both runtime and memory consumption while maintaining accuracy. This method achieves computational complexity of O(N log N ), making it particularly beneficial for high-resolution and three-dimensional imaging. Our results highlight the potential of the adapted DCR approach for improving model-based MPI reconstruction in practical applications.

Paper

Share this book

Add to My Shelf

Label Tree Embeddings for Acoustic Scene Classification

by Phan, Huy , Hertel, Lars , Maass, Marco in Acoustics , Classification , Clustering

2016

We present in this paper an efficient approach for acoustic scene classification by exploring the structure of class labels. Given a set of class labels, a category taxonomy is automatically learned by collectively optimizing a clustering of the labels into multiple meta-classes in a tree structure. An acoustic scene instance is then embedded into a low-dimensional feature representation which consists of the likelihoods that it belongs to the meta-classes. We demonstrate state-of-the-art results on two different datasets for the acoustic scene classification task, including the DCASE 2013 and LITIS Rouen datasets.

Paper

Share this book

Add to My Shelf

What Makes Audio Event Detection Harder than Classification?

by Mazur, Radoslaw , McLoughlin, Ian , Mertins, Alfred in Classification , Classifiers , False alarms

2018

There is a common observation that audio event classification is easier to deal with than detection. So far, this observation has been accepted as a fact and we lack of a careful analysis. In this paper, we reason the rationale behind this fact and, more importantly, leverage them to benefit the audio event detection task. We present an improved detection pipeline in which a verification step is appended to augment a detection system. This step employs a high-quality event classifier to postprocess the benign event hypotheses outputted by the detection system and reject false alarms. To demonstrate the effectiveness of the proposed pipeline, we implement and pair up different event detectors based on the most common detection schemes and various event classifiers, ranging from the standard bag-of-words model to the state-of-the-art bank-of-regressors one. Experimental results on the ITC-Irst dataset show significant improvements to detection performance. More importantly, these improvements are consistent for all detector-classifier combinations.

Paper

Share this book

Add to My Shelf

Learning Compact Structural Representations for Audio Events Using Regressor Banks

by Phan, Huy , Mazur, Radoslaw , McLoughlin, Ian in Audio signals , Classification , Representations

2016

We introduce a new learned descriptor for audio signals which is efficient for event representation. The entries of the descriptor are produced by evaluating a set of regressors on the input signal. The regressors are class-specific and trained using the random regression forests framework. Given an input signal, each regressor estimates the onset and offset positions of the target event. The estimation confidence scores output by a regressor are then used to quantify how the target event aligns with the temporal structure of the corresponding category. Our proposed descriptor has two advantages. First, it is compact, i.e. the dimensionality of the descriptor is equal to the number of event classes. Second, we show that even simple linear classification models, trained on our descriptor, yield better accuracies on audio event classification task than not only the nonlinear baselines but also the state-of-the-art results.

Paper

Share this book

Add to My Shelf

Measurement of Sound Fields Using Moving Microphones

by Mazur, Radoslaw , Mertins, Alfred , Maass, Marco in Algorithms , Interpolation , Linear equations

2016

The sampling of sound fields involves the measurement of spatially dependent room impulse responses, where the Nyquist-Shannon sampling theorem applies in both the temporal and spatial domain. Therefore, sampling inside a volume of interest requires a huge number of sampling points in space, which comes along with further difficulties such as exact microphone positioning and calibration of multiple microphones. In this paper, we present a method for measuring sound fields using moving microphones whose trajectories are known to the algorithm. At that, the number of microphones is customizable by trading measurement effort against sampling time. Through spatial interpolation of the dynamic measurements, a system of linear equations is set up which allows for the reconstruction of the entire sound field inside the volume of interest.

Paper

Share this book

Add to My Shelf

CaR-FOREST: Joint Classification-Regression Decision Forests for Overlapping Audio Event Detection

by Mertins, Alfred , Phan, Huy , Hertel, Lars in Classification , Decision trees , Offsets

2016

This report describes our submissions to Task2 and Task3 of the DCASE 2016 challenge. The systems aim at dealing with the detection of overlapping audio events in continuous streams, where the detectors are based on random decision forests. The proposed forests are jointly trained for classification and regression simultaneously. Initially, the training is classification-oriented to encourage the trees to select discriminative features from overlapping mixtures to separate positive audio segments from the negative ones. The regression phase is then carried out to let the positive audio segments vote for the event onsets and offsets, and therefore model the temporal structure of audio events. One random decision forest is specifically trained for each event category of interest. Experimental results on the development data show that our systems significantly outperform the baseline on the Task2 evaluation while they are inferior to the baseline in the Task3 evaluation.

Paper

Share this book

Add to My Shelf

CNN-LTE: a Class of 1-X Pooling Convolutional Neural Networks on Label Tree Embeddings for Audio Scene Recognition

by Mertins, Alfred , Phan, Huy , Hertel, Lars in Artificial neural networks , Feature extraction , Feature recognition

2016

We describe in this report our audio scene recognition system submitted to the DCASE 2016 challenge. Firstly, given the label set of the scenes, a label tree is automatically constructed. This category taxonomy is then used in the feature extraction step in which an audio scene instance is represented by a label tree embedding image. Different convolutional neural networks, which are tailored for the task at hand, are finally learned on top of the image features for scene recognition. Our system reaches an overall recognition accuracy of 81.2% and 83.3% and outperforms the DCASE 2016 baseline with absolute improvements of 8.7% and 6.1% on the development and test data, respectively.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter