Mitigating the Effects of Label Noise in Time-Series Sensor Data

dc.contributor.advisorMetsis, Vangelis
dc.contributor.authorAtkinson, Gentry M.
dc.contributor.committeeMemberZong, Ziliang
dc.contributor.committeeMemberGao, Byron
dc.contributor.committeeMemberAthitsos, Vassilis
dc.date.accessioned2024-05-15T20:11:26Z
dc.date.available2024-05-15T20:11:26Z
dc.date.issued2023-05
dc.description.abstractUncertainty in training datasets can negatively affect the ability of machine-learning models to learn solutions to real-world problems. Label noise, defined as inaccurate annotations, is one particularly detrimental uncertainty. This issue is more pronounced in time series data, where interpretability is inherently limited. A variety of real-world applications, such as medical diagnostics, human activity recognition, weather forecasting, and emotion recognition, depend on data obtained from one or more sensors. Consequently, the development of enhanced methodologies for identifying and relabeling inaccurately labeled instances in sensor datasets is of paramount importance across various domains. This dissertation presents a comprehensive set of approaches to improve the usability of incorrectly labeled datasets: detecting incorrect labels, correcting labels, and generating new instances of data from noisy examples. We demonstrate that self-supervised deep feature extraction can effectively learn representations of time series data, irrespective of the uncertainties caused by incorrect labels. These deep features are then employed in conjunction with a novel adaptation of the k-nearest neighbors (KNN) algorithm, as well as an estimated label transition matrix, to relabel data instances with a high degree of precision. Finally, this research introduces an innovative adaptation of generative diffusion for the generation of new data instances derived from examples with noisy labels. This data generation method is shown to adhere more closely to the distribution of the example data compared to existing techniques. The proposed approaches hold significant potential to improve the performance and reliability of machine learning models in various fields that rely on sensor data.
dc.description.departmentComputer Science
dc.formatText
dc.format.extent115 pages
dc.format.medium1 file (.pdf)
dc.identifier.citationAtkinson, G. M. (2023). Mitigating the effects of label noise in time-series sensor data (Unpublished dissertation). Texas State University, San Marcos, Texas.
dc.identifier.urihttps://hdl.handle.net/10877/18720
dc.language.isoen
dc.subjectlabel noise
dc.subjectmachine learning
dc.subjectnoisy data
dc.subjectdata processing
dc.subjectfeature learning
dc.subjectcontrastive learning
dc.subjectself-supervised learning
dc.subjecttime series
dc.subjectvisualization
dc.titleMitigating the Effects of Label Noise in Time-Series Sensor Data
dc.typeDissertation
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas State University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ATKINSON-DISSERTATION-2023.pdf
Size:
1.97 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: