Ngu, Anne H.H.Phillips, Clark Raymond2015-06-262015-06-262015-05Phillips, C. R. (2015). <i>Employing an efficient and scalable implementation of the Cost Sensitive Alternating Decision Tree algorithm to efficiently link person records</i> (Unpublished thesis). Texas State University, San Marcos, Texas.https://hdl.handle.net/10877/5576When collecting person records for census, identifying individuals accurately is paramount. Over time, people change their phone numbers, their addresses, even their names. Without a universal identifier such as a social security number or a finger-print, it is difficult to know whether two distinct person records represent the same individual. The Cost Sensitive Alternating Decision Tree (CSADT) algorithm (a supervised learning algorithm) is employed as a Record Linkage solution to the problem of resolving whether two person records are the same individual. A person record consists of several attributes such as a name, a phone number, an address, etc. The number of person-record-pairs grows exponentially as the number of records increase. In order to accommodate this exponential growth, a scalable implementation of the CSADT algorithm was employed. A thorough investigation and evaluation are presented demonstrating the effectiveness of this implementation of the CSADT algorithm on linking person records.Text99 pages1 file (.pdf)enDecision treesMachine learningAlternating decision treeComputer science--MathematicsCombinatorial analysisEmploying an Efficient and Scalable Implementation of the Cost Sensitive Alternating Decision Tree algorithm to Efficiently Link Person RecordsThesis