Optical Character Recognition software: A Comparative Evaluation Study for Information Retrieval

Document Type : Original Article

Author

قسم المکتبات جامعة بنها

Abstract

In recent times ,There were many Arab digitization projects for their
information resources, such as: digital repository of Egyptian university theses,
and the digitization of information resources at Dar Al-Kutub in Egypt and
another Arabic projects. But these resources have been digitized in the images
formats and we cannot search in the full text and retrieved only though keywords
that are very limited so, the Arabic text remained locked in these images, and can
not take advantage of its texts in the search and retrieval and Blocking a lot of
information that can be used
This study aims to identify Arabic OCR software, its characteristics and its
accuracy that can solve this problem. The most important problems of OCR
software with Arabic characters are characteristics of these complex characters,
both in structure and in the way of writing. The study depended on the evaluation
methodology using a data collection tool: a checklist
The study found that there is a rar of Arabic OCR software which is
divided into commercial, open source and free on the Internet. and the study
show that only one program reached the accuracy rate (100%) in the recognition
of Arabic texts, ((Google Drive OCR), and the number of (4) programs reached a
rate of accuracy (90%) in the recognition of Arabic texts
The study recommended that Arab information institutions should develop
software to recognize Arabic characters with high accuracy, and that scientific
research bodies and research centers provide budgets for the development of
OCR techniques and Scientific research provide budgets for the development of
arabic OCR techniques


Keywords