...

Calfa OCR
Dedicated to oriental languages and manuscripts

Process massively and extract data from your scanned documents, archives, books, and push them to their full digital potential.

Powerful generic models for text recognition in Arabic and Armenian

To process printed or simple, handwritten documents in Arabic or Armenian script, our ready-to-use general AI models offer you and economical solution for an excellent performance.

Creation of custom AI models able to read the most complex corpora

For any corpora in non-Western language, dialect or from a complex hand : get an unmatched text recognition accuracy by using a custom AI model.

Our offers

General models

Direct OCR/HTR

Immediate processing of regular documents by our general AI models

Processing

Per page
TXT and XML

Custom

Research Plan


Bring your own data or create it online

Development

1 model customized on your corpus

Processing

3500 p. included
TXT and XML

Custom

Custom project

Data creation

By our experts according to your requirements

Development

Customized models for each of your needs

Processing

Large page volumes
Custom data formatting

Overcome usual recognition challenges

Text recognition on the most complex handwritings

95% to 99% of accuracy on average

Page layout detection and analysis

Titles, subtitles, notes etc. are detected and labelled

Curved or vertical lines of text

All line orientations are natively supported

Noisy scans, damaged pages

We deal with it through customized trainings

Mixed alphabets and multi-languages

Processed even when mixed within the text

Left-to-right, right-to-left, top-to-bottom...

Powerful whatever the reading direction is

Use Cases

Have a try

Send us a sample of the document you would like to digitize to get a Calfa OCR demonstration

Contact us

Technical features

Supported languages

Arabic languages Armenian Chinese Hebrew Georgian Syriac Ancient Greek Modern Greek ...

Other languages on demand.

Features

  • Text Recognition on very complex writings (handwritten and printed)
  • Page layout analysis
  • Auto keywording and semantic classification
  • Curved text, vertical lines, damaged pages, noisy scans
  • Mixed alphabets and multi-languages
  • Data input

    • PDF
    • Image file (JPG, PNG, TIFF...)
    • Color and B&W
    • IIIF server

    Data output

  • TXT, DOC, ODT, PDF
  • IIIF server
  • PDF with text overlay
  • ALTO
  • PageXML
  • Others on demand
  • Frequently Asked Questions

    1

    Can I use Calfa OCR for handwritten pages ?

    Yes, Calfa OCR is specially developed to recognize manuscripts. The oldest manuscripts we processed was from the 9th Century, the most recent from the 20th.

    2

    Does it work with all kinds of handwritings ?

    Calfa OCR can be run on many writing styles. When necessary, for very special handwritings, we include a training phase in the project to adapt the OCR recognition.

    3

    What is the recognition rate of Calfa OCR ?

    The recognition rate is the percentage of correctness in the text recognition compared to the document. It varies depending on the handwriting style, font layout and scan quality. Feel free to request a demo to get a view on the recognition rate Calfa OCR can reach on your documents.

    4

    Does the OCR also work with typed documents ?

    Yes, Calfa OCR also recognizes typed documents like newspapers pages, machine-typed, letters etc. in applicable languages.