| ABBYY Mobile OCR Engine 4 |
Go to: Contents | Guided Tour
The ABBYY Mobile OCR Engine regular expression alphabet is described in the following table:
| Item name | Conventional regular expression sign | Usage examples and explanations |
|---|---|---|
| Any character | . | c.t — denotes words like "cat", "cot" |
| Character from a character range | [] | [b-d]ell — denotes words like "bell", "cell", "dell"
[ty]ell — denotes words "tell" and "yell". |
| Character out of a character range | [^] | [^y]ell — denotes words like "dell", "cell",
"tell", but forbids "yell"
[^n-s]ell — denotes words like "bell", "cell", but forbids "nell", "oell", "pell", "qell", "rell" and "sell" |
| Or | | | c(a|u)t — denotes words "cat" and "cut" |
| 0 or more occurrences in a row | * | 10* — denotes numbers 1, 10, 100, 1000 etc. |
| 1 or more occurrences in a row | + | 10+ — allows numbers 10, 100, 1000 etc., but forbids 1. |
| Letter or digit | [0-9a-zA-Z] | [0-9a-zA-Z] — allows a single character; [0-9a-zA-Z]+ — allows any word |
| Capital Latin letter | [A-Z] | |
| Small Latin letter | [a-z] | |
| Capital Cyrillic letter | [А-Я] | |
| Small Cyrillic letter | [а-я] | |
| Digit | [0-9] | |
| Space | \s | |
| System character | @ | |
| Word from dictionary | @(Dictionary) | The Dictionary parameter sets the path to the user dictionary from which
words must be taken. Backslashes in the path must be doubled. For example:
Note: Some programming languages (such as C++) require you to escape backslashes in string literals. In this case you will need two escaped backslashes, which will result in a quadrupled backslash. The example above will look like this in C++: L"@(D:\\\\MyFolder\\\\MyDictionary.amd)" |
Notes:
The number denoting day may consist of one digit (e.g. 1, 2 etc.) or two digits (e.g. 02, 12), but it cannot be zero (00 or 0). The regular expression for the day should then look like this: ((|0)[1-9])|([12][0-9])|(30)|(31).
The regular expression for the month should look like this: ((|0)[1-9])|(10)|(11)|(12).
The regular expression for the year should look like this: ((19)[0-9][0-9])|([0-9][0-9])|((20)[0-9][0-9]|([0-9][0-9])).
What is left is to combine all this together and separate the numbers by period (e.g. 1.03.1999). The period is an auxiliary sign, so we must put a backslash (\) before it. The regular expression for the full date should then look like this:
(((|0)[1-9])|([12][0-9])|(30)|(31))\. (((|0)[1-9])|(10)|(11)|(12))\.(((19)[0-9][0-9])|([0-9][0-9])|((20)[0-9][0-9]|([0-9][0-9])))
Regular expression for e-mail addressesYou can easily make a language for denoting e-mail addresses. The regular expression for an e-mail address should look like this:
[a-zA-Z0-9_\-\.]+\@[a-zA-Z0-9\.\-]+\.[a-zA-Z]+
Recognizing with Custom Languages