...

Calfa OCR
Dedicated to oriental languages and manuscripts

Process massively and extract data from your scanned documents, archives, books, and push them to their full digital potential.

Supported languages

Arabic languages Armenian Chinese Hebrew Georgian Syriac Ancient Greek Modern Greek ...

Other languages on demand.

Features

  • Text Recognition on very complex writings (handwritten and printed)
  • Page layout analysis
  • Auto keywording and semantic classification
  • Curved text, vertical lines, damaged pages, noisy scans
  • Mixed alphabets and multi-languages
  • Data input

    • PDF
    • Image file (JPG, PNG, TIFF...)
    • Color and B&W
    • IIIF server

    Data output

  • TXT, DOC, ODT, PDF
  • IIIF server
  • PDF with text overlay
  • ALTO
  • PageXML
  • Others on demand
  • Specificities

    Powerful and unique features

    Text recognition on the most complex handwritings

    95% to 99% of accuracy on average

    Page layout detection and analysis

    Titles, subtitles, notes etc. are detected and labelled

    Curved or vertical lines of text

    All line orientations are natively supported

    Noisy scans, damaged pages

    We deal with it through customized trainings

    Mixed alphabets and multi-languages

    Processed even when mixed within the text

    Left-to-right, right-to-left, top-to-bottom...

    Powerful whatever the reading direction is

    Our offers

    General models

    Direct OCR/HTR

    Immediate processing of regular documents by our general AI models

    Processing

    Per page
    TXT and XML

    Professionnal

    Research Plan


    Bring your own data or create it online

    Development

    1 model customized on your corpus

    Processing

    processing of 3500 p. included
    TXT and XML

    Professionnal

    Custom project

    Data creation

    Created by our experts according to your requirements

    Development

    Customized models for each of your needs

    Processing

    Large page volumes
    Custom data formatting

    Use Cases

    Have a try

    Send us a sample of the document you would like to digitize to get a Calfa OCR demonstration

    Contact us

    Frequently Asked Questions

    1

    Can I use Calfa OCR for handwritten pages ?

    Yes, Calfa OCR is specially developed to recognize manuscripts. The oldest manuscripts we processed was from the 9th Century, the most recent from the 20th.

    2

    Does it work with all kinds of handwritings ?

    Calfa OCR can be run on many writing styles. When necessary, for very special handwritings, we include a training phase in the project to adapt the OCR recognition.

    3

    What is the recognition rate of Calfa OCR ?

    The recognition rate is the percentage of correctness in the text recognition compared to the document. It varies depending on the handwriting style, font layout and scan quality. Feel free to request a demo to get a view on the recognition rate Calfa OCR can reach on your documents.

    4

    Does the OCR also work with typed documents ?

    Yes, Calfa OCR also recognizes typed documents like newspapers pages, machine-typed, letters etc. in applicable languages.