CedArabic is a versatile system for analyzing scanned, handwritten documents and searching repositories of scanned documents

The intended user of CedArabic

The system is primarily designed for Questioned Document Examination.  It has a number of functionalities which make it useful for analyzing documents or searching handwritten notes and historical manuscripts. To use all the given functionalities, an image must first be processed after it is opened. We can do that by opening an image and clicking the image processing button given on the tool bar. However, we can use the image enhancement functionalities without processing the image.

Principal functions of CedArabic:

Text Search

Handwritten documents can be searched with a word as a query.  The query word is typed in a manner similar to searching for documents using a search engine such as Google.  Search results are images and the documents that they are contained in.  The search can be performed using one of the following approaches:
         a. Using Word Truth – the text is searched based on the manually entered text corresponding to the word images in the document
         b. Using Image – the image corresponding to the query is searched from the specified document.

Word Spotting

CedArabic allows the user to select a word image as a query with the goal of finding similar word images in a specified document. The results are displayed in rank order.

Writer Verification / Identification

When two documents are opened, one as a Questioned (Q) and another as a known (K) document, CedArabic obtains a score (log-likelihood ratio, or LLR) as to whether these two documents were written by the same individual or by different individuals.

When there are multiple known documents (from the same writer) then the system can learn from the samples. At least four samples are required to use this option. CedArabic enrolls the known writer by feeding in the samples of scanned handwritten training documents and trains the system with respect to that writer. A questioned document can then be checked for against the trained writer. The testing can be performed on either a single questioned document or on multiple questioned documents.
CedArabic can be used to retrieve the closest document to a questioned document from a given set of known documents. The batch-processing feature of the system is used for this purpose.

Signature Verification

The system has in-built capabilities for signature verification. There are two modes for signature verification. In the first mode a pair of signatures (known and questioned) is compared and a score is obtained as to whether the questioned signature is genuine.

In the second mode a set of multiple known signatures is used to determine whether a questioned signature is genuine. Here the system learns from a set of signature samples-- at least 4 learning samples are recommended. The testing can be performed on several questioned signatures simultaneously. A probability of each questioned signature belonging to the learnt signature is displayed.

Image Input

Document images can be directly acquired by CedArabic (with a scanner attached to the computer) using the scan operation in the File menu.  Preferably the scanning should be done at about 300 dpi gray-scale. Even if scanning is done at 600 dpi, internally it is converted into 300 dpi format. Alternatively, previously scanned documents can be opened. At present, the system supports TIFF and PNG formats.

Image Enhancement

The system has several tools for enhancing the readability and visualization of a document.  Some of these tools are: stroke thickening, underline removal, rule line removal, adaptive thresholding (to extract the foreground image or writing, from the background image or paper), contour display, etc.

Image Display

A document image can be displayed at several levels of magnification: (i) original gray-scale, (ii) with segmented words in the binary image, and (iii) segmented and word truthed, where the word-truth is obtained using one of the options described below.

A windowing operation allows two documents to be simultaneously displayed, either side-by-side or one below the other, when their writership is to be compared.

Image Selection

A sub-image can be selected and considered to be a document for the purpose of analysis. This is done by using a cropping tool which allows the placement of a variable-size rectangular box or an arbitrary polygon around the region of interest (ROI). This operation is useful when the document is complex and consists of many different types of material such as printing, handwriting, logos, signature, etc.

For the purpose of analyzing character shapes, a mouse can be used to manually extract individual characters by clicking a set of points around a character.

Document Metrics

When a document is opened, the system automatically computes a set of properties (PR) that characterize the document. They include the distribution of light intensity on the paper, number of handwritten words, distribution of word spacing, average height of words, connectivity of words, etc. Some of these properties can be displayed as graphs.

Word Segmentation

CedArabic can automatically separate words from a scanned handwritten document and represent adjacent word images in different colors. Since there might be errors in the segmentation, the system allows the user to correct the errors by using the Correct Word Segmentation option.

Word Truthing

CedArabic allows the user to enter the “ground truth” for every segmented word. In this mode, the system presents a window under the segmented word where the truth can be entered.  The purpose of word truthing is to increase the confidence in subsequent operations such as writer verification (described below).

Another way to minimize typing is to use word recognition (WR) with a pre-specified lexicon and then correct the results manually using the truthing method.

Character Truthing

The users can manually segment and truth individual characters. This is a time consuming operation but can result in a high accuracy rate for writer verification.

Legibility and Readability Analysis

Word gap analysis and comparison with the Palmer writing system is possible. In order to compare with the Palmer style, the user has to manually segment the characters.

Saving the Output

Results of CedArabic can be saved in several ways. Many of the outputs have a PRINT option.  Some of them allow the results to be transferred to a spreadsheet.

DOWNLOAD Version 1.00
[18 MB]: Trial version
(Released on March 30, 2007).

Password protected.
Click here to request user name
and password

System Requirements: Windows Platform (XP, 2000, NT), RAM: 128 MB; 256 MB preferred). User setting of Virtual memory of 600 MB recommended.