Random Forest Classification of Active Galactic Nuclei with Optical and Infrared Data
This project develops and evaluates a pipeline for automatic classification of active galactic nuclei (AGN) using the Random Forest algorithm. AGN are galaxies where the active accretion of gas and dust onto the disk around the galaxy’s central supermassive black hole is generating enough energy to outshine the stars in the galaxy. Here we use Random Forest, a supervised, decision-tree-based algorithm to classify AGN using only optical and infrared photometry from the Gaia and WISE space telescope datasets. We train and test the algorithm on 5 classes of AGN, twice each with magnitude and color, using both the original data and data modified using the Synthetic Minority Over-sampling Technique (SMOTE). These 4 models have total classification accuracies of 90-93%, but f1-scores for each class varying from 0.44 to 0.97 across all models, with only quasars being classified highly accurately with f1-scores of 0.96-0.97. This may be due to an overabundance of quasars in the sample. This method can be explored further with more photometric data from other telescopes in other wavelength ranges, more algorithm parameters, other classification algorithms, and inclusion of other classes.
astronomy, astrophysics, machine learning, Physics, Honors College
McKee, J. M. (2022). Random forest classification of active galactic nuclei with optical and infrared data (Unpublished thesis). Texas State University, San Marcos, Texas