A Novel Workflow for Large Scale Thesis Digitization




Peters, Todd C.
Moore, Jeremy D.
Long, Jason

Journal Title

Journal ISSN

Volume Title



Texas State University recently began digitizing approximately 6,000 theses to create digital preservation copies and electronic versions that may eventually be used for patron access. This presentation will discuss our novel workflow that allows student workers to rapidly scan, process, and perform quality control on the images while managing the metadata necessary for future ingest into our institutional repository. In brief, the process begins with students debinding and scanning theses, downloading MARC records with MARCEdit, and using an in-house web application to sort images based on content. Students then process the images with a combination of BASH scripts, ImageMagick, and Adobe Photoshop as they perform quality control and fix any errors found. The resultant preservation TIFFs are OCR’d and combined into PDFs using ABBYY FineReader 12. A final quality control step is performed by the Digital Media Specialist at which point the electronic conversion has been completed. The workflow allows a student to process approximately 50 theses in a 20-hour work week.



theses, dissertations, ETDs, digitization, metadata, project management, scanning


Peters, T., Moore, J., & Long, J. (2016). A novel workflow for large scale thesis digitization. Presented at the Texas Conference on Digital Libraries, Austin, Texas.


Rights Holder

Rights License

Rights URI