![]() |
Leptonica
1.82.0
Image processing and image analysis suite
|
#include <string.h>
#include "allheaders.h"
Go to the source code of this file.
Functions | |
static l_int32 | recogGetCharsetSize (l_int32 type) |
static l_int32 | recogAddCharstrLabels (L_RECOG *recog) |
static l_int32 | recogAddAllSamples (L_RECOG **precog, PIXAA *paa, l_int32 debug) |
L_RECOG * | recogCreateFromRecog (L_RECOG *recs, l_int32 scalew, l_int32 scaleh, l_int32 linew, l_int32 threshold, l_int32 maxyshift) |
L_RECOG * | recogCreateFromPixa (PIXA *pixa, l_int32 scalew, l_int32 scaleh, l_int32 linew, l_int32 threshold, l_int32 maxyshift) |
L_RECOG * | recogCreateFromPixaNoFinish (PIXA *pixa, l_int32 scalew, l_int32 scaleh, l_int32 linew, l_int32 threshold, l_int32 maxyshift) |
L_RECOG * | recogCreate (l_int32 scalew, l_int32 scaleh, l_int32 linew, l_int32 threshold, l_int32 maxyshift) |
void | recogDestroy (L_RECOG **precog) |
l_int32 | recogGetCount (L_RECOG *recog) |
l_ok | recogSetParams (L_RECOG *recog, l_int32 type, l_int32 min_nopad, l_float32 max_wh_ratio, l_float32 max_ht_ratio) |
l_int32 | recogGetClassIndex (L_RECOG *recog, l_int32 val, char *text, l_int32 *pindex) |
l_ok | recogStringToIndex (L_RECOG *recog, char *text, l_int32 *pindex) |
l_int32 | recogGetClassString (L_RECOG *recog, l_int32 index, char **pcharstr) |
l_ok | l_convertCharstrToInt (const char *str, l_int32 *pval) |
L_RECOG * | recogRead (const char *filename) |
L_RECOG * | recogReadStream (FILE *fp) |
L_RECOG * | recogReadMem (const l_uint8 *data, size_t size) |
l_ok | recogWrite (const char *filename, L_RECOG *recog) |
l_ok | recogWriteStream (FILE *fp, L_RECOG *recog) |
l_ok | recogWriteMem (l_uint8 **pdata, size_t *psize, L_RECOG *recog) |
PIXA * | recogExtractPixa (L_RECOG *recog) |
Variables | |
static const l_int32 | MaxExamplesInClass = 256 |
static const l_int32 | DefaultCharsetType = L_ARABIC_NUMERALS |
static const l_int32 | DefaultMinNopad = 1 |
static const l_float32 | DefaultMaxWHRatio = 3.0 |
static const l_float32 | DefaultMaxHTRatio = 2.6 |
static const l_int32 | DefaultThreshold = 150 |
static const l_int32 | DefaultMaxYShift = 1 |
Recog creation, destruction and access L_RECOG *recogCreateFromRecog() L_RECOG *recogCreateFromPixa() L_RECOG *recogCreateFromPixaNoFinish() L_RECOG *recogCreate() void recogDestroy()
Recog accessors l_int32 recogGetCount() l_int32 recogSetParams() static l_int32 recogGetCharsetSize()
Character/index lookup l_int32 recogGetClassIndex() l_int32 recogStringToIndex() l_int32 recogGetClassString() l_int32 l_convertCharstrToInt()
Serialization L_RECOG *recogRead() L_RECOG *recogReadStream() L_RECOG *recogReadMem() l_int32 recogWrite() l_int32 recogWriteStream() l_int32 recogWriteMem() PIXA *recogExtractPixa() static l_int32 recogAddCharstrLabels() static l_int32 recogAddAllSamples()
The recognizer functionality is split into four files: recogbasic.c: create, destroy, access, serialize recogtrain.c: training on labeled and unlabeled data recogident.c: running the recognizer(s) on input recogdid.c: running the recognizer(s) on input using a document image decoding (DID) hidden markov model
This is a content-adapted (or book-adapted) recognizer (BAR) application. The recognizers here are typically assembled from data that has been labeled by a generic recognition system, such as Tesseract. The general procedure to create a recognizer (recog) from labeled data is to add the labeled character bitmaps, either one at a time or all together from a pixa with labeled pix.
The suggested use for a BAR that consists of labeled templates drawn from a single source (e.g., a book) is to identify unlabeled samples by using unscaled character templates in the BAR, picking the template closest to the unlabeled sample.
Outliers can be removed from a pixa of labeled pix. This is one of two methods that use averaged templates (the other is greedy splitting of characters). See recogtrain.c for a discussion and the implementation.
A special bootstrap recognizer (BSR) can be used to make a BAR from unlabeled book data. This is done by comparing character images from the book with labeled templates in the BSR, where all images are scaled to h = 40. The templates can be either the scanned images or images consisting of width-normalized strokes derived from the skeleton of the character bitmaps.
Two BARs of labeled character data, that have been made by different recognizers, can be joined by extracting a pixa of the labeled templates from each, joining the two pixa, and then and regenerating a BAR from the joined set of templates. If all the labeled character data is from a single source (e.g, a book), identification can proceed using unscaled templates (either the input image or width-normalized lines). But if the labeled data comes from more than one source, (a "hybrid" recognizer), the templates should be scaled, and we recommend scaling to a fixed height.
Suppose it is not possible to generate a BAR with a sufficient number of templates of each class taken from a single source. In that case, templates from the BSR itself can be added. This is the condition described above, where the labeled templates come from multiple sources, and it is necessary to do all character matches using templates that have been scaled to a fixed height (e.g., 40). Likewise, the samples to be identified using this hybrid recognizer must be modified in the same way. See prog/recogtest3.c for an example of the steps that can be taken in the construction of a BAR using a BSR.
For training numeric input, an example set of calls that scales each training input to fixed h and will use the line templates of width linew for identifying unknown characters is: L_Recog *rec = recogCreate(0, h, linew, 128, 1); for (i = 0; i < n; i++) { // read in n training digits Pix *pix = ... recogTrainLabeled(rec, pix, NULL, text[i], 0); } recogTrainingFinished(&rec, 1, -1, -1.0); // required
It is an error if any function that computes averages, removes outliers or requests identification of an unlabeled character, such as: (1) computing the sample averages: recogAverageSamples() (2) removing outliers: recogRemoveOutliers1() or recogRemoveOutliers2() (3) requesting identification of an unlabeled character: recogIdentifyPix() is called before an explicit call to finish training. Note that to do further training on a "finished" recognizer, you can set recog->train_done = FALSE; add the new training samples, and again call recogTrainingFinished(&rec, 1, -1, -1.0); // required
If not scaling, using the images directly for identification, and removing outliers, do something like this: L_Recog *rec = recogCreate(0, 0, 0, 128, 1); for (i = 0; i < n; i++) { // read in n training characters Pix *pix = ... recogTrainLabeled(rec, pix, NULL, text[i], 0); } recogTrainingFinished(&rec, 1, -1, -1.0); if (!rec) ... [return] // remove outliers recogRemoveOutliers1(&rec, 0.7, 2, NULL, NULL);
You can generate a recognizer from a pixa where the text field in each pix is the character string label for the pix. For example, the following recognizer will store unscaled line images: L_Recog *rec = recogCreateFromPixa(pixa, 0, 0, linew, 128, 1); and in use, it is fed unscaled line images to identify.
For the following, assume that you have a pixa of labeled templates. If it is likely that some of the input templates are mislabeled, there are several things that can be done to remove them. The first is to put a size and quantity filter on them; e.g. Pixa *pixa2 = recogFilterPixaBySize(pixa1, 10, 15, 2.6); Then you can remove outliers; e.g., Pixa *pixa3 = pixaRemoveOutliers2(pixa2, -1.0, -1, NULL, NULL);
To this point, all templates are from a single source, so you can make a recognizer that uses the unscaled templates and optionally attempts to split touching characters: L_Recog *recog1 = recogCreateFromPixa(pixa3, ...); Alternatively, if you need more templates for some of the classes, you can pad with templates from a "bootstrap" recognizer (BSR). If you pad, it is necessary to scale the templates and input samples to a fixed height, and no attempt will be made to split the input sample connected components: L_Recog *recog1 = recogCreateFromPixa(pixa3, 0, 40, 0, 128, 0); recogPadDigitTrainingSet(&recog1, 40, 0);
A special case is a pure BSR, that contains images scaled to a fixed height (we use 40 in these examples). For this,use either the scanned bitmap: L_Recog *recboot = recogCreateFromPixa(pixa, 0, 40, 0, 128, 1); or width-normalized lines (use width of 5 here): L_Recog *recboot = recogCreateFromPixa(pixa, 0, 40, 5, 128, 1);
This can be used to train a new book adapted recognizer (BAC), on unlabeled data from, e.g., a book. To do this, the following is required: (1) the input images from the book must be scaled in the same way as those in the BSR, and (2) both the BSR and the input images must be set up to be either input scanned images or width-normalized lines.
Definition in file recogbasic.c.
l_ok l_convertCharstrToInt | ( | const char * | str, |
l_int32 * | pval | ||
) |
[in] | str | input string representing one UTF-8 character; not more than 4 bytes |
[out] | pval | integer value for the input. Think of it as a 1-to-1 hash code. |
Definition at line 783 of file recogbasic.c.
References L_Rdid::size.
[in] | precog | addr of recog |
[in] | paa | pixaa from previously trained recog |
[in] | debug |
Notes: (1) On error, the input recog is destroyed. (2) This is used with the serialization routine recogRead(), where each pixa in the pixaa represents a set of characters in a different class. Before calling this function, we have verified that the number of character classes, given by the setsize field in recog, equals the number of pixa in the paa. The character labels for each set are in the sa_text field.
Definition at line 1193 of file recogbasic.c.
References L_CLONE, L_INSERT, L_NOCOPY, lept_stderr(), L_Recog::pixaa_u, pixaaAddPix(), pixaaAddPixa(), pixaaGetCount(), pixaaGetPixa(), pixaCreate(), pixaDestroy(), pixaGetCount(), pixaGetPix(), recogDestroy(), recogTrainingFinished(), L_Recog::sa_text, and sarrayGetString().
|
static |
[in] | recog |
Definition at line 1141 of file recogbasic.c.
References L_CLONE, L_NOCOPY, L_Recog::pixaa_u, pixaaGetCount(), pixaaGetPixa(), pixaDestroy(), pixaGetCount(), pixaGetPix(), pixDestroy(), pixSetText(), L_Recog::sa_text, and sarrayGetString().
Referenced by recogExtractPixa().
L_RECOG* recogCreate | ( | l_int32 | scalew, |
l_int32 | scaleh, | ||
l_int32 | linew, | ||
l_int32 | threshold, | ||
l_int32 | maxyshift | ||
) |
[in] | scalew | scale all widths to this; use 0 otherwise |
[in] | scaleh | scale all heights to this; use 0 otherwise |
[in] | linew | width of normalized strokes; use 0 to skip |
[in] | threshold | for binarization; typically ~128; 0 for default |
[in] | maxyshift | from nominal centroid alignment; default is 1 |
Notes: (1) If scalew == 0 and scaleh == 0, no scaling is done. If one of these is 0 and the other is > 0, scaling is isotropic to the requested size. We typically do not set both > 0. (2) Use linew > 0 to convert the templates to images with fixed width strokes. linew == 0 skips the conversion. (3) The only valid values for maxyshift are 0, 1 and 2. It is recommended to use maxyshift == 1 (default value). Using maxyshift == 0 is much faster than maxyshift == 1, but it is much less likely to find the template with the best correlation. Use of anything but 1 results in a warning. (4) Scaling is used for finding outliers and for training a book-adapted recognizer (BAR) from a bootstrap recognizer (BSR). Scaling the height to a fixed value and scaling the width accordingly (e.g., scaleh = 40, scalew = 0) is recommended. (5) The storage for most of the arrays is allocated when training is finished.
Definition at line 411 of file recogbasic.c.
Referenced by recogCreateFromPixaNoFinish().
L_RECOG* recogCreateFromPixa | ( | PIXA * | pixa, |
l_int32 | scalew, | ||
l_int32 | scaleh, | ||
l_int32 | linew, | ||
l_int32 | threshold, | ||
l_int32 | maxyshift | ||
) |
[in] | pixa | of labeled, 1 bpp images |
[in] | scalew | scale all widths to this; use 0 otherwise |
[in] | scaleh | scale all heights to this; use 0 otherwise |
[in] | linew | width of normalized strokes; use 0 to skip |
[in] | threshold | for binarization; typically ~150 |
[in] | maxyshift | from nominal centroid alignment; default is 1 |
Notes: (1) This is a convenience function for training from labeled data. The pixa can be read from file. (2) The pixa should contain the unscaled bitmaps used for training. (3) See recogCreate() for use of scalew, scaleh and linew. (4) It is recommended to use maxyshift = 1 (the default value) (5) All examples in the same class (i.e., with the same character label) should be similar. They can be made similar by invoking recogRemoveOutliers[1,2]() on pixa before calling this function.
Definition at line 284 of file recogbasic.c.
References recogCreateFromPixaNoFinish(), and recogTrainingFinished().
Referenced by recogCreateFromRecog(), recogMakeBootDigitRecog(), recogPadDigitTrainingSet(), recogRemoveOutliers1(), and recogRemoveOutliers2().
L_RECOG* recogCreateFromPixaNoFinish | ( | PIXA * | pixa, |
l_int32 | scalew, | ||
l_int32 | scaleh, | ||
l_int32 | linew, | ||
l_int32 | threshold, | ||
l_int32 | maxyshift | ||
) |
[in] | pixa | of labeled, 1 bpp images |
[in] | scalew | scale all widths to this; use 0 otherwise |
[in] | scaleh | scale all heights to this; use 0 otherwise |
[in] | linew | width of normalized strokes; use 0 to skip |
[in] | threshold | for binarization; typically ~150 |
[in] | maxyshift | from nominal centroid alignment; default is 1 |
Notes: (1) See recogCreateFromPixa() for details. (2) This is also used to generate a pixaa with templates in each class within a pixa. For that, all args except for pixa are ignored.
Definition at line 330 of file recogbasic.c.
References L_CLONE, pixaCountText(), pixaGetCount(), pixaGetPix(), pixaIsFull(), pixaVerifyDepth(), pixDestroy(), pixGetText(), recogCreate(), and recogTrainLabeled().
Referenced by recogCreateFromPixa(), and recogSortPixaByClass().
L_RECOG* recogCreateFromRecog | ( | L_RECOG * | recs, |
l_int32 | scalew, | ||
l_int32 | scaleh, | ||
l_int32 | linew, | ||
l_int32 | threshold, | ||
l_int32 | maxyshift | ||
) |
[in] | recs | source recog with arbitrary input parameters |
[in] | scalew | scale all widths to this; use 0 otherwise |
[in] | scaleh | scale all heights to this; use 0 otherwise |
[in] | linew | width of normalized strokes; use 0 to skip |
[in] | threshold | for binarization; typically ~128 |
[in] | maxyshift | from nominal centroid alignment; default is 1 |
Notes: (1) This is a convenience function that generates a recog using the unscaled training data in an existing recog. (2) It is recommended to use maxyshift = 1 (the default value) (3) See recogCreate() for use of scalew, scaleh and linew.
Definition at line 237 of file recogbasic.c.
References pixaDestroy(), recogCreateFromPixa(), and recogExtractPixa().
void recogDestroy | ( | L_RECOG ** | precog | ) |
[in,out] | precog | will be set to null before returning |
Definition at line 480 of file recogbasic.c.
References L_Recog::bmf, bmfDestroy(), L_Recog::centtab, L_Recog::dna_tochar, l_dnaDestroy(), L_Recog::naasum, L_Recog::naasum_u, L_Recog::nasum, L_Recog::nasum_u, numaaDestroy(), numaDestroy(), L_Recog::pixa, L_Recog::pixa_id, L_Recog::pixa_tr, L_Recog::pixa_u, L_Recog::pixaa, L_Recog::pixaa_u, pixaaDestroy(), L_Recog::pixadb_ave, L_Recog::pixadb_boot, L_Recog::pixadb_split, pixaDestroy(), L_Recog::pixdb_ave, L_Recog::pixdb_range, pixDestroy(), L_Recog::pta, L_Recog::pta_u, L_Recog::ptaa, L_Recog::ptaa_u, ptaaDestroy(), ptaDestroy(), L_Recog::rch, L_Recog::rcha, rchaDestroy(), rchDestroy(), recogDestroyDid(), L_Recog::sa_text, sarrayDestroy(), and L_Recog::sumtab.
Referenced by recogAddAllSamples(), recogAverageSamples(), recogPadDigitTrainingSet(), recogRemoveOutliers1(), recogRemoveOutliers2(), recogSortPixaByClass(), and recogTrainingFinished().
[in] | recog |
Notes: (1) This generates a pixa of all the unscaled images in the recognizer, where each one has its character class label in the pix text field, by flattening pixaa_u to a pixa.
Definition at line 1122 of file recogbasic.c.
References L_CLONE, L_Recog::pixaa_u, pixaaFlattenToPixa(), and recogAddCharstrLabels().
Referenced by recogAddDigitPadTemplates(), recogCreateFromRecog(), recogRemoveOutliers1(), and recogRemoveOutliers2().
|
static |
[in] | type | of charset |
Definition at line 602 of file recogbasic.c.
References L_ARABIC_NUMERALS, L_LC_ALPHA, L_LC_ROMAN_NUMERALS, L_UC_ALPHA, L_UC_ROMAN_NUMERALS, and L_UNKNOWN.
l_int32 recogGetClassIndex | ( | L_RECOG * | recog, |
l_int32 | val, | ||
char * | text, | ||
l_int32 * | pindex | ||
) |
[in] | recog | with LUT's pre-computed |
[in] | val | integer value; can be up to 3 bytes for UTF-8 |
[in] | text | text from which val was derived; used if not found |
[out] | pindex | index into dna_tochar |
Notes: (1) This is used during training. There is one entry in recog->dna_tochar (integer value, e.g., ascii) and one in recog->sa_text (e.g, ascii letter in a string) for each character class. (2) This searches the dna character array for val. If it is not found, the template represents a character class not already seen: it increments setsize (the number of character classes) by 1, and augments both the index (dna_tochar) and text (sa_text) arrays. (3) Returns the index in &index, except on error. (4) Caller must check the function return value.
Definition at line 655 of file recogbasic.c.
References L_Recog::dna_tochar, L_COPY, l_dnaAddNumber(), l_dnaGetCount(), l_dnaGetIValue(), L_Recog::sa_text, sarrayAddString(), and L_Recog::setsize.
l_int32 recogGetClassString | ( | L_RECOG * | recog, |
l_int32 | index, | ||
char ** | pcharstr | ||
) |
[in] | recog | |
[in] | index | into array of char types |
[out] | pcharstr | string representation; returns an empty string on error |
Notes: (1) Extracts a copy of the string from sa_text, which the caller must free. (2) Caller must check the function return value.
Definition at line 753 of file recogbasic.c.
References L_COPY, L_Recog::sa_text, sarrayGetString(), L_Recog::setsize, and stringNew().
Referenced by recogShowMatch(), and recogStringToIndex().
l_int32 recogGetCount | ( | L_RECOG * | recog | ) |
[in] | recog |
Definition at line 535 of file recogbasic.c.
References L_Recog::setsize.
L_RECOG* recogRead | ( | const char * | filename | ) |
[in] | filename |
Notes: (1) When a recog is serialized, a pixaa of the templates that are actually used for correlation is saved in the pixaa_u array of the recog. These can be different from the templates that were used to generate the recog, because those original templates can be scaled and turned into normalized lines. When recog1 is deserialized to recog2, these templates are put in both the unscaled array (pixaa_u) and the modified array (pixaa) in recog2. Why not put it in only the unscaled array and let recogTrainingFinalized() regenerate the modified templates? The reason is that with normalized lines, the operation of thinning to a skeleton and dilating back to a fixed width is not idempotent. Thinning to a skeleton saves pixels at the end of a line segment, and thickening the skeleton puts additional pixels at the end of the lines. This tends to close gaps.
Definition at line 842 of file recogbasic.c.
References fopenReadStream(), and recogReadStream().
L_RECOG* recogReadMem | ( | const l_uint8 * | data, |
size_t | size | ||
) |
[in] | data | serialization of recog (not ascii) |
[in] | size | of data in bytes |
Definition at line 956 of file recogbasic.c.
References fopenReadFromMemory(), recogReadStream(), and L_Rdid::size.
L_RECOG* recogReadStream | ( | FILE * | fp | ) |
[in] | fp | file stream |
Definition at line 871 of file recogbasic.c.
Referenced by recogRead(), and recogReadMem().
l_ok recogSetParams | ( | L_RECOG * | recog, |
l_int32 | type, | ||
l_int32 | min_nopad, | ||
l_float32 | max_wh_ratio, | ||
l_float32 | max_ht_ratio | ||
) |
[in] | recog | to be padded, if necessary |
[in] | type | type of char set; -1 for default; see enum in recog.h |
[in] | min_nopad | min number in a class without padding; use -1 for default |
[in] | max_wh_ratio | max width/height ratio allowed for splitting; use -1.0 for default |
[in] | max_ht_ratio | max of max/min averaged template height ratio; use -1.0 for default |
Notes: (1) This is called when a recog is created. (2) Default min_nopad value allows for some padding. To disable padding, set min_nopad = 0. To pad only when no samples are available for the class, set min_nopad = 1. (3) The max_wh_ratio limits the width/height ratio for components that we attempt to split. Splitting long components is expensive. (4) The max_ht_ratio is a quality requirement on the training data. The recognizer will not run if the averages are computed and the templates do not satisfy it.
Definition at line 573 of file recogbasic.c.
References L_Recog::charset_type.
l_ok recogStringToIndex | ( | L_RECOG * | recog, |
char * | text, | ||
l_int32 * | pindex | ||
) |
[in] | recog | |
[in] | text | text string for some class |
[out] | pindex | index for that class; -1 if not found |
Definition at line 700 of file recogbasic.c.
References recogGetClassString(), and L_Recog::setsize.
l_ok recogWrite | ( | const char * | filename, |
L_RECOG * | recog | ||
) |
[in] | filename | |
[in] | recog |
Notes: (1) The pixaa of templates that is written is the modified one in the pixaa field. It is the pixaa that is actually used for correlation. This is not the unscaled array of labeled bitmaps, in pixaa_u, that was used to generate the recog in the first place. See the notes in recogRead() for the rationale.
Definition at line 993 of file recogbasic.c.
References fopenWriteStream(), and recogWriteStream().
l_ok recogWriteMem | ( | l_uint8 ** | pdata, |
size_t * | psize, | ||
L_RECOG * | recog | ||
) |
[out] | pdata | data of serialized recog (not ascii) |
[out] | psize | size of returned data |
[in] | recog |
Notes: (1) Serializes a recog in memory and puts the result in a buffer.
Definition at line 1065 of file recogbasic.c.
References fopenWriteWinTempfile(), l_binaryReadStream(), and recogWriteStream().
l_ok recogWriteStream | ( | FILE * | fp, |
L_RECOG * | recog | ||
) |
[in] | fp | file stream opened for "wb" |
[in] | recog |
Definition at line 1024 of file recogbasic.c.
Referenced by recogWrite(), and recogWriteMem().