Improved degraded document recognition with hybrid modeling techniques and character n-grams

Anja Brakensiek, Daniel Willett, Gerhard Rigoll

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

In this paper a robust multifont character recognition system for degraded documents such as photocopy or fax is described. The system is based on Hidden Markov Models (HMMs) using discrete and hybrid modeling techniques, where the latter makes use of an information theory-based neural network. The presented recognition results refer to the SEDAL-database of English documents using no dictionary. It is also demonstrated that the usage of a language model, that consists of character n-grams yields significantly better recognition results. Our resulting system clearly outperforms commercial systems and leads to further error rate reductions compared to previous results reached on this database.

Original languageEnglish
Pages (from-to)438-441
Number of pages4
JournalProceedings - International Conference on Pattern Recognition
Volume15
Issue number4
StatePublished - 2000
Externally publishedYes

Fingerprint

Dive into the research topics of 'Improved degraded document recognition with hybrid modeling techniques and character n-grams'. Together they form a unique fingerprint.

Cite this