Leptonica  1.82.0
Image processing and image analysis suite
recogdid.c File Reference
#include <string.h>
#include <math.h>
#include "allheaders.h"

Go to the source code of this file.

Functions

static l_int32 recogPrepareForDecoding (L_RECOG *recog, PIX *pixs, l_int32 debug)
 
static l_int32 recogMakeDecodingArray (L_RECOG *recog, l_int32 index, l_int32 debug)
 
static l_int32 recogRunViterbi (L_RECOG *recog, PIX **ppixdb)
 
static l_int32 recogRescoreDidResult (L_RECOG *recog, PIX **ppixdb)
 
static PIXrecogShowPath (L_RECOG *recog, l_int32 select)
 
static l_int32 recogGetWindowedArea (L_RECOG *recog, l_int32 index, l_int32 x, l_int32 *pdely, l_int32 *pwsum)
 
static l_int32 recogTransferRchToDid (L_RECOG *recog, l_int32 x, l_int32 y)
 
BOXArecogDecode (L_RECOG *recog, PIX *pixs, l_int32 nlevels, PIX **ppixdb)
 
l_ok recogCreateDid (L_RECOG *recog, PIX *pixs)
 
l_ok recogDestroyDid (L_RECOG *recog)
 
l_int32 recogDidExists (L_RECOG *recog)
 
L_RDIDrecogGetDid (L_RECOG *recog)
 
l_ok recogSetChannelParams (L_RECOG *recog, l_int32 nlevels)
 

Variables

static const l_float32 SetwidthFraction = 0.95
 
static const l_int32 MaxYShift = 1
 
static const l_float32 DefaultAlpha2 [] = {0.95f, 0.9f}
 
static const l_float32 DefaultAlpha4 [] = {0.95f, 0.9f, 0.75f, 0.25f}
 

Detailed Description

     Top-level identification
        BOXA             *recogDecode()
     Generate decoding arrays
        static l_int32    recogPrepareForDecoding()
        static l_int32    recogMakeDecodingArray()
     Dynamic programming for best path
        static l_int32    recogRunViterbi()
        static l_int32    recogRescoreDidResult()
        static PIX       *recogShowPath()
     Create/destroy temporary DID data
        l_int32           recogCreateDid()
        l_int32           recogDestroyDid()
     Various helpers
        l_int32           recogDidExists()
        L_RDID           *recogGetDid()
        static l_int32    recogGetWindowedArea()
        l_int32           recogSetChannelParams()
        static l_int32    recogTransferRchToDid()
 See recogbasic.c for examples of training a recognizer, which is
 required before it can be used for document image decoding.
 Gary Kopec pioneered this hidden markov approach to "Document Image
 Decoding" (DID) in the early 1990s.  It is based on estimation
 using a generative model of the image generation process, and
 provides the most likely decoding of an image if the model is correct.
 Given the model, it finds the maximum a posteriori (MAP) "message"
 given the observed image.  The model describes how to generate
 an image from a message, and the MAP message is derived from the
 observed image using Bayes' theorem.  This approach can also be used
 to build the model, using the iterative expectation/maximization
 method from labeled but errorful data.
 In a little more detail: The model comprises three things: the ideal
 printed character templates, the independent bit-flip noise model, and
 the character setwidths.  When a character is printed, the setwidth
 is the distance in pixels that you move forward before being able
 to print the next character.  It is typically slightly less than the
 width of the character template: if too small, an extra character can be
 hallucinated; if too large, it will not be able to match the next
 character template on the line.  The model assumes that the probabilities
 of bit flip depend only on the assignment of the pixel to background
 or template foreground.  The multilevel templates have different
 bit flip probabilities for each level.  Because a character image
 is composed of many pixels, each of which can be independently flipped,
 the actual probability of seeing any rendering is exceedingly small,
 being composed of the product of the probabilities for each pixel.
 The log likelihood is used both to avoid numeric underflow and,
 more importantly, because it results in a summation of independent
 pixel probabilities.  That summation can be shown, in Kopec's
 original paper, to consist of a sum of two terms: (a) the number of
 fg pixels in the bit-and of the observed image with the ideal
 template and (b) the number of fg pixels in the template.  Each
 has a coefficient that depends only on the bit-flip probabilities
 for the fg and bg.  A beautiful result, and computationally simple!
 One nice feature of this approach is that the result of the decoding
 is not very sensitive to the values  used for the bit flip probabilities.
 The procedure for finding the best decoding (MAP) for a given image goes
 under several names: Viterbi, dynamic programming, hidden markov model.
 It is called a "hidden markov model" because the templates are assumed
 to be printed serially and we don't know what they are -- the identity
 of the templates must be inferred from the observed image.
 The possible decodings form a dense trellis over the pixel positions,
 where at each pixel position you have the possibility of having any
 of the characters printed there (with some reference point) or having
 a single pixel wide space inserted there.  Thus, before the trellis
 can be traversed, we must do the work of finding the log probability,
 at each pixel location, that each of the templates was printed there.
 Armed with those arrays of data, the dynamic programming procedure
 moves from left to right, one pixel at a time, recursively finding
 the path with the highest log probability that gets to that pixel
 position (and noting which template was printed to arrive there).
 After reaching the right side of the image, we can simply backtrack
 along the path, jumping over each template that lies on the highest
 scoring path.  This best path thus only goes through a few of the
 pixel positions.
 There are two refinements to the original Kopec paper.  In the first,
 one uses multiple, non-overlapping fg templates, each with its own
 bit flip probability.  This makes sense, because the probability
 that a fg boundary pixel flips to bg is greater than that of a fg
 pixel not on the boundary.  And the flip probability of a fg boundary
 pixel is smaller than that of a bg boundary pixel, which in turn
 is greater than that of a bg pixel not on a boundary (the latter
 is taken to be the true background).  Then the simplest realistic
 multiple template model has three templates that are not background.
 In the second refinement, a heuristic (strict upper bound) is used
 iteratively in the Viterbi process to compute the log probabilities.
 Using the heuristic, you find the best path, and then score all nodes
 on that path with the actual probability, which is guaranteed to
 be a smaller number.  You run this iteratively, rescoring just the best
 found path each time.  After each rescoring, the path may change because
 the local scores have been reduced.  However, the process converges
 rapidly, and when it doesn't change, it must be the best path because
 it is properly scored (even if neighboring paths are heuristically
 scored).  The heuristic score is found column-wise by assuming
 that all the fg pixels in the template are on fg pixels in the image --
 we just take the minimum of the number of pixels in the template
 and image column.  This can easily give a 10-fold reduction in
 computation because the heuristic score can be computed much faster
 than the exact score.
 For reference, the classic paper on the approach by Kopec is:
 * "Document Image Decoding Using Markov Source Models", IEEE Trans.
   PAMI, Vol 16, No. 6, June 1994, pp 602-617.
 A refinement of the method for multilevel templates by Kopec is:
 * "Multilevel Character Templates for Document Image Decoding",
   Proc. SPIE 3027, Document Recognition IV, p. 168ff, 1997.
 Further refinements for more efficient decoding are given in these
 two papers, which are both stored on leptonica.org:
 * "Document Image Decoding using Iterated Complete Path Search", Minka,
   Bloomberg and Popat, Proc. SPIE Vol 4307, p. 250-258, Document
   Recognition and Retrieval VIII, San Jose, CA 2001.
 * "Document Image Decoding using Iterated Complete Path Search with
   Subsampled Heuristic Scoring", Bloomberg, Minka and Popat, ICDAR 2001,
   p. 344-349, Sept. 2001, Seattle.

Definition in file recogdid.c.

Function Documentation

◆ recogCreateDid()

l_ok recogCreateDid ( L_RECOG recog,
PIX pixs 
)

recogCreateDid()

Parameters
[in]recog
[in]pixsof 1 bpp image to match
Returns
0 if OK, 1 on error

Definition at line 751 of file recogdid.c.

References L_Recog::did, L_Rdid::narray, pixClone(), L_Rdid::pixs, recogDestroyDid(), L_Recog::setsize, and L_Rdid::size.

◆ recogDecode()

BOXA* recogDecode ( L_RECOG recog,
PIX pixs,
l_int32  nlevels,
PIX **  ppixdb 
)

recogDecode()

Parameters
[in]recogwith LUT's pre-computed
[in]pixstypically of multiple touching characters, 1 bpp
[in]nlevelsof templates; 2 for now
[out]ppixdb[optional] debug result; can be null
Returns
boxa segmentation of pixs into characters, or NULL on error
Notes:
     (1) The input pixs has been filtered so that it is likely to be
         composed of more than one touching character.  Specifically,
         its height can only slightly exceed that of the tallest
         unscaled template, the width is somewhat larger than the
         width of the widest unscaled template, and the w/h aspect ratio
         is bounded by max_wh_ratio.
     (2) This uses the DID mechanism with labeled templates to
         segment the input pixs.  The resulting segmentation is
         returned.  (It is given by did->boxa).
     (3) In debug mode, the Viterbi path is rescored based on all
         the templates.  In non-debug mode, the same procedure is
         carried out by recogIdentifyPix() on the result of the
         segmentation.

Definition at line 219 of file recogdid.c.

References L_Rdid::pixs.

◆ recogDestroyDid()

l_ok recogDestroyDid ( L_RECOG recog)

◆ recogDidExists()

l_int32 recogDidExists ( L_RECOG recog)

recogDidExists()

Parameters
[in]recog
Returns
1 if recog->did exists; 0 if not or on error.

Definition at line 878 of file recogdid.c.

References L_Recog::did.

◆ recogGetDid()

L_RDID* recogGetDid ( L_RECOG recog)

recogGetDid()

Parameters
[in]recog
Returns
did still owned by the recog, or NULL on error
Notes:
     (1) This also makes sure the arrays are defined.

Definition at line 900 of file recogdid.c.

References L_Rdid::counta, L_Rdid::delya, L_Recog::did, and L_Rdid::narray.

Referenced by recogGetWindowedArea(), recogMakeDecodingArray(), recogRescoreDidResult(), recogRunViterbi(), recogSetChannelParams(), recogShowPath(), and recogTransferRchToDid().

◆ recogGetWindowedArea()

static l_int32 recogGetWindowedArea ( L_RECOG recog,
l_int32  index,
l_int32  x,
l_int32 *  pdely,
l_int32 *  pwsum 
)
static

recogGetWindowedArea()

Parameters
[in]recog
[in]indexof template
[in]xpixel position of left hand edge of template
[out]pdelyy shift of template relative to pix1
[out]pwsumnumber of fg pixels in window of pixs
Returns
0 if OK, 1 on error
Notes:
     (1) This is called after the best path has been found through
         the trellis, in order to produce a correlation that can be used
         to evaluate the confidence we have in the identification.
         The correlation is |1 & 2|^2 / (|1| * |2|).
         |1 & 2| is given by the count array, |2| is found from
         nasum_u[], and |1| is wsum returned from this function.

Definition at line 945 of file recogdid.c.

References L_Rdid::delya, L_CLONE, L_Rdid::narray, PIX_DST, PIX_SRC, L_Recog::pixa_u, pixaGetPix(), pixCountPixels(), pixCreate(), pixDestroy(), pixGetDimensions(), pixRasterop(), L_Rdid::pixs, recogGetDid(), and L_Recog::sumtab.

◆ recogMakeDecodingArray()

static l_int32 recogMakeDecodingArray ( L_RECOG recog,
l_int32  index,
l_int32  debug 
)
static

recogMakeDecodingArray()

Parameters
[in]recog
[in]indexof averaged template
[in]debug1 for debug output; 0 otherwise
Returns
0 if OK, 1 on error
Notes:
     (1) Generates the bit-and sum array for a character template along pixs.
     (2) The values are saved in the scoring arrays at the left edge
         of the template as it is positioned on pixs.

Definition at line 356 of file recogdid.c.

References L_Rdid::counta, L_Rdid::delya, L_CLONE, L_Rdid::namoment, L_Rdid::narray, L_Rdid::nasum, numaGetIArray(), L_Recog::pixa_u, pixaGetPix(), pixCreate(), pixDestroy(), pixGetDimensions(), L_Rdid::pixs, L_Recog::pta_u, ptaGetIPt(), recogGetDid(), and L_Recog::sumtab.

◆ recogPrepareForDecoding()

static l_int32 recogPrepareForDecoding ( L_RECOG recog,
PIX pixs,
l_int32  debug 
)
static

recogPrepareForDecoding()

Parameters
[in]recogwith LUT's pre-computed
[in]pixstypically of multiple touching characters, 1 bpp
[in]debug1 for debug output; 0 otherwise
Returns
0 if OK, 1 on error
Notes:
     (1) Binarizes and crops input pixs.
     (2) Removes previous L_RDID struct and makes a new one.
     (3) Generates the bit-and sum arrays for each character template
         at each pixel position in pixs.  These are used in the
         Viterbi dynamic programming step.
     (4) The values are saved in the scoring arrays at the left edge
         of the template.  They are used in the Viterbi process
         at the setwidth position (which is near the RHS of the template
         as it is positioned on pixs) in the generated trellis.

Definition at line 295 of file recogdid.c.

References L_Rdid::pixs.

◆ recogRescoreDidResult()

static l_int32 recogRescoreDidResult ( L_RECOG recog,
PIX **  ppixdb 
)
static

recogRescoreDidResult()

Parameters
[in]recogwith LUT's pre-computed
[out]ppixdb[optional] debug result; can be null
Returns
0 if OK, 1 on error
Notes:
     (1) This does correlation matching with all unscaled templates,
         using the character segmentation determined by the Viterbi path.

Definition at line 620 of file recogdid.c.

References L_Rdid::boxa, boxaGetBox(), boxDestroy(), boxGetGeometry(), L_Rdid::fullarrays, L_COPY, lept_stderr(), L_Rdid::naxloc, numaGetCount(), pixClipRectangle(), pixDestroy(), L_Rdid::pixs, L_Recog::rch, rchExtract(), recogGetDid(), recogIdentifyPix(), recogShowPath(), and recogTransferRchToDid().

◆ recogRunViterbi()

static l_int32 recogRunViterbi ( L_RECOG recog,
PIX **  ppixdb 
)
static

recogRunViterbi()

Parameters
[in]recogwith LUT's pre-computed
[out]ppixdb[optional] debug result; can be null
Returns
0 if OK, 1 on error
Notes:
     (1) This can be used when the templates are unscaled.  It works by
         matching the average, unscaled templates of each class to
         all positions.
     (2) It is recursive, in that
         (a) we compute the score successively at all pixel positions x,
         (b) to compute the score at x in the trellis, for each
             template we look backwards to (x - setwidth) to get the
             score if that template were to be printed with its
             setwidth location at x.  We save at x the template and
             score that maximizes the sum of the score at (x - setwidth)
             and the log-likelihood for the template to be printed with
             its LHS there.
     (3) The primary output is a boxa of the locations for splitting
         the input image.  These locations are used later to split the
         image and send the pieces individually for recognition.
         This can be done in either recogIdentifyMultiple(), or
         for debugging in recogRescoreDidResult().

Definition at line 481 of file recogdid.c.

References L_Rdid::beta, L_Rdid::counta, L_Rdid::fullarrays, L_Rdid::gamma, L_Rdid::narray, L_Recog::nasum_u, numaGetIArray(), recogGetDid(), L_Rdid::setwidth, L_Rdid::size, L_Rdid::trellisscore, and L_Rdid::trellistempl.

◆ recogSetChannelParams()

l_ok recogSetChannelParams ( L_RECOG recog,
l_int32  nlevels 
)

recogSetChannelParams()

Parameters
[in]recog
[in]nlevels
Returns
0 if OK, 1 on error
Notes:
     (1) This converts the independent bit-flip probabilities in the
         "channel" into log-likelihood coefficients on image sums.
         These coefficients are only defined for the non-background
         template levels.  Thus for nlevels = 2 (one fg, one bg),
         only beta[1] and gamma[1] are used.  For nlevels = 4 (three
         fg templates), we use beta[1-3] and gamma[1-3].

Definition at line 1009 of file recogdid.c.

References recogGetDid().

◆ recogShowPath()

◆ recogTransferRchToDid()

static l_int32 recogTransferRchToDid ( L_RECOG recog,
l_int32  x,
l_int32  y 
)
static

recogTransferRchToDid()

Parameters
[in]recogwith rch and did defined
[in]xleft edge of extracted region, relative to decoded line
[in]ytop edge of extracted region, relative to input image
Returns
0 if OK, 1 on error
Notes:
     (1) This is used to transfer the results for a single character match
         to the rescored did arrays.

Definition at line 1055 of file recogdid.c.

References L_Rch::index, L_Rdid::nadely_r, L_Rdid::nasample_r, L_Rdid::nascore_r, L_Rdid::natempl_r, L_Rdid::nawidth_r, L_Rdid::naxloc_r, numaAddNumber(), L_Recog::rch, recogGetDid(), L_Rch::sample, L_Rch::score, L_Rch::width, L_Rch::xloc, and L_Rch::yloc.

Referenced by recogRescoreDidResult().