Object Character Recognition (OCR) Data Extraction
Illinois Natural History Survey
Prairie Research Institute
JOB DESCRIPTION
At INHS - Prairie Research Institute, we believe we can unleash the true potential of data to provide valuable insights and drive innovation. As an OCR specialist, the candidate will develop solutions/applications to digitize historical field survey data collected for the past years. This digitization will enable the team to analyze and drive several kinds of research across the state of Illinois. The duties will also involve pre-processing of extracted/digitized data, and occasionally being present in the INHS office at Oak St. Apart from the interview, there will be an additional task-based assessment. The OCR position is with the Illinois Natural History Survey, Biological Survey Assessment Program, at the Prairie Research Institute.
Duties and Responsibilities
- Research and design custom handwriting recognition techniques in Python and OpenCV.
- Tinker the OCR solution for a variety of field data samples, with different templates – one OCR application will not suffice for all data samples.
- Under supervision, segregate and categorize the recognized data and store accordingly in CSV, Excel and eventually into databases.
- Scan vial samples/field data sheets for OCR.
QUALIFICATIONS
- Strong command of Python programing language and openCV library
- Proven experience in OCR development
- Experience working with concepts like contours, kernels, gradients, etc.
- Strong command of programming concepts in Python: file operations, os, sys libraries, modularization, encapsulation etc.
Preferred Qualifications
- Familiarity with other computer vision related python packages/products/tools
- Ability to fine-tune existing models/OCR for handwriting recognition (English alphabets and numbers)
- Fundamental understanding of CSV and MS Excel.
- Knowledge of AWS or GCP services relevant to handwritten/text recognition is a bonus (e.g. GCP Vision API, AWS textract etc.)
Education
Currently enrolled in an undergraduate or graduate degree program in computer science, information science or similar. Equivalent experience and alternate degree fields will be considered depending on the nature and depth of the experience.
Salary: $18-$20 per hour, approximately 10-20 hours per week with flexible scheduling.
Work type/location: Hybrid. Many tasks can be performed online/remotely. Other tasks, meetings, and presentations require the successful candidate to be 'in person' on campus at the Forbes Natural History Building, 1816 S. Oak St; Champaign 61820.
Available: Fall 2023 semester with possible extension into Spring 2024, dependent on funding and project need.
To Apply: Email cover letter outlining relevant experience, resume/CV, and contact information for three professional references to Dhananjay Pandey dpandey2@illinois.edu. Reference “OCR data extraction" in subject line. For full consideration, applications should be received by 8/28/2023, but the position may be filled sooner when a suitable candidate is found.