Data Mining Methods Applied to Chromosome Aberrations in Squamous Cell Carcinoma Karyotypes




Slatton, Jeremy C.

Journal Title

Journal ISSN

Volume Title



This analysis used three types of karyotype parsing systems, Karyo Reader, Progenetix ISCN2matrix, and CyDAS to convert published squamous cell carcinoma karyotypes from the Mitelman Database of recurrent chromosome aberrations in cancer into statistical data for mining procedures. The goal of this study was to examine the input requirements and output options available in each system to determine the system’s usability and accuracy for potential mining experiments. Each karyotype parsing system was utilized to pinpoint high frequency recurrent chromosome aberrations that potentially influence the development of squamous cell carcinoma. Output results from CyDAS were deemed best suited for database storage of karyotype data and production of graphical representations of chromosome aberrations while Progenetix proved useful only for examining a summary of the structural chromosome gains and losses in the data. Karyo Reader output provided data for binary statistical analyses as well as analysis of structural and numerical chromosome gains and losses in the data. From the Mitelman Database of Chromosome Aberrations in cancer, 574 cases were extracted representing 92 literature references from 25 journals. Karyo Reader was able to parse 85.44% of structural aberrations, similar to the 85.71% of cases parsed by CyDAS, but much lower than the 94.95% parsed by Progenetix. However, Karyo Reader identified more than three times as many aberrations than Progenetix or CyDAS. High frequency recurrent chromosome aberrations identified by Karyo Reader and CyDAS were consistent with literature, though results from Progenetix ISCN2matrix were not. As a result, Karyo Reader provided the only accurate, suitably formatted output to use for statistical analysis. Karyo Reader binary aberration data was used to perform a principal component analysis (PCA) on the binary chromosome aberration data extracted from Karyo Reader. For early evolutionary mutagenic pathways, aberrations were eliminated from PCA if they were not present in at least 30% of cases. The top 19 chromosome aberrations occurring in the squamous cell carcinoma Karyo Reader binary data were deletions of chromosomes and chromosome regions: Y, 8p22, 8p23,10,13,14,15, 18,21,22,4, 8p22, 8p23,3pl3,3pl4, 3p21,3p22,3p23, 3p24, 3p25, and 3p26. The NIPT distributions for each of these chromosome bands indicated that aberrations dY, d8p22, d8p23, and d3pl3 are early aberrations, that chromosome aberrations dlO, dl4, dl8, dl3, d22, d3pl4, d3p24, d3p25, d3p23, and d3p26 are moderate stage aberrations, and that aberrations d21, dl5, d3p22, d3p21 appear as later stage aberrations in squamous cell carcinoma development. Principal component analysis (PCA) of the statistical output yielded a concise set of nine potential evolutionary mutagenic pathways for squamous cell carcinoma development. Two principal components were extracted from the data, representing two separate early mutagenic pathways occurring in squamous cell carcinoma cases. The analysis identified deletions of chromosome Y, 8p22, 8p23, and 3pl3 as early chromosome aberrations involved in squamous cell carcinoma development. PCA showed a very divergent path of mutagenesis for later stage aberrations resulting in entire chromosome deletions or further deletions of bands within the chromosome segment 3p. Armed with the knowledge that these aberrations potentially play a predominate role in development of squamous cell carcinoma, chromosome regions can be pinpointed for further research into the biological pathways impacted by squamous cell carcinoma as well as target chromosome regions for gene therapy or interventional treatments.



squamous cell carcinoma, karyotypes, data mining, statistical methods


Slatton, J. (2005). Data mining methods applied to chromosome aberrations in squamous cell carcinoma karyotypes (Unpublished thesis). Texas State University-San Marcos, San Marcos, Texas.


Rights Holder

Rights License

Rights URI