Mitigating the Effects of Label Noise in Time-Series Sensor Data
Date
2023-05
Authors
Atkinson, Gentry M.
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Uncertainty in training datasets can negatively affect the ability of machine-learning models to learn solutions to real-world problems. Label noise, defined as inaccurate annotations, is one particularly detrimental uncertainty. This issue is more pronounced in time series data, where interpretability is inherently limited. A variety of real-world applications, such as medical diagnostics, human activity recognition, weather forecasting, and emotion recognition, depend on data obtained from one or more sensors. Consequently, the development of enhanced methodologies for identifying and relabeling inaccurately labeled instances in sensor datasets is of paramount importance across various domains.
This dissertation presents a comprehensive set of approaches to improve the usability of incorrectly labeled datasets: detecting incorrect labels, correcting labels, and generating new instances of data from noisy examples. We demonstrate that self-supervised deep feature extraction can effectively learn representations of time series data, irrespective of the uncertainties caused by incorrect labels. These deep features are then employed in conjunction with a novel adaptation of the k-nearest neighbors (KNN) algorithm, as well as an estimated label transition matrix, to relabel data instances with a high degree of precision.
Finally, this research introduces an innovative adaptation of generative diffusion for the generation of new data instances derived from examples with noisy labels. This data generation method is shown to adhere more closely to the distribution of the example data compared to existing techniques. The proposed approaches hold significant potential to improve the performance and reliability of machine learning models in various fields that rely on sensor data.
Description
Keywords
label noise, machine learning, noisy data, data processing, feature learning, contrastive learning, self-supervised learning, time series, visualization
Citation
Atkinson, G. M. (2023). Mitigating the effects of label noise in time-series sensor data (Unpublished dissertation). Texas State University, San Marcos, Texas.