Mitigating the Effects of Label Noise in Time-Series Sensor Data
dc.contributor.advisor | Metsis, Vangelis | |
dc.contributor.author | Atkinson, Gentry M. | |
dc.contributor.committeeMember | Zong, Ziliang | |
dc.contributor.committeeMember | Gao, Byron | |
dc.contributor.committeeMember | Athitsos, Vassilis | |
dc.date.accessioned | 2024-05-15T20:11:26Z | |
dc.date.available | 2024-05-15T20:11:26Z | |
dc.date.issued | 2023-05 | |
dc.description.abstract | Uncertainty in training datasets can negatively affect the ability of machine-learning models to learn solutions to real-world problems. Label noise, defined as inaccurate annotations, is one particularly detrimental uncertainty. This issue is more pronounced in time series data, where interpretability is inherently limited. A variety of real-world applications, such as medical diagnostics, human activity recognition, weather forecasting, and emotion recognition, depend on data obtained from one or more sensors. Consequently, the development of enhanced methodologies for identifying and relabeling inaccurately labeled instances in sensor datasets is of paramount importance across various domains. This dissertation presents a comprehensive set of approaches to improve the usability of incorrectly labeled datasets: detecting incorrect labels, correcting labels, and generating new instances of data from noisy examples. We demonstrate that self-supervised deep feature extraction can effectively learn representations of time series data, irrespective of the uncertainties caused by incorrect labels. These deep features are then employed in conjunction with a novel adaptation of the k-nearest neighbors (KNN) algorithm, as well as an estimated label transition matrix, to relabel data instances with a high degree of precision. Finally, this research introduces an innovative adaptation of generative diffusion for the generation of new data instances derived from examples with noisy labels. This data generation method is shown to adhere more closely to the distribution of the example data compared to existing techniques. The proposed approaches hold significant potential to improve the performance and reliability of machine learning models in various fields that rely on sensor data. | |
dc.description.department | Computer Science | |
dc.format | Text | |
dc.format.extent | 115 pages | |
dc.format.medium | 1 file (.pdf) | |
dc.identifier.citation | Atkinson, G. M. (2023). Mitigating the effects of label noise in time-series sensor data (Unpublished dissertation). Texas State University, San Marcos, Texas. | |
dc.identifier.uri | https://hdl.handle.net/10877/18720 | |
dc.language.iso | en | |
dc.subject | label noise | |
dc.subject | machine learning | |
dc.subject | noisy data | |
dc.subject | data processing | |
dc.subject | feature learning | |
dc.subject | contrastive learning | |
dc.subject | self-supervised learning | |
dc.subject | time series | |
dc.subject | visualization | |
dc.title | Mitigating the Effects of Label Noise in Time-Series Sensor Data | |
dc.type | Dissertation | |
thesis.degree.department | Computer Science | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Texas State University | |
thesis.degree.level | Doctoral | |
thesis.degree.name | Doctor of Philosophy |