Skip to content
Snippets Groups Projects
  • Jean Chalard's avatar
    a64a1a46
    Fix a bug where a node size would be seen as increasing. · a64a1a46
    Jean Chalard authored
    The core reason for this is quite shrewd. When a word is a bigram
    of itself, the corresponding chargroup will have a bigram referring
    to itself. When computing bigram offsets, we use cached addresses of
    chargroups, but we compute the size of the node as we go. Hence, a
    discrepancy may happen between the base offset as seen by the bigram
    (which uses the recomputed value) and the target offset (which uses
    the cached value).
    When this happens, the cached node address is too large. The relative
    offset is negative, which is expected, since it points to this very
    charnode whose start is a few bytes earlier. But since the cached
    address is too large, the offset is computed as smaller than it should
    be.
    On the next pass, the cache has been refreshed with the newly computed
    size and the seen offset is now correct (or at least, much closer to
    correct). The correct value is larger than the previously computed
    offset, which was too small. If it happens that it crosses the -255 or
    -65335 boundary, the address will be seen as needing 1 more byte than
    previously computed. If this is the only change in size of this node,
    the node will be seen as having a larger size than previously, which
    is unexpected. Debug code was catching this and crashing the program.
    
    So this case is very rare, but in an even rarer occurence, it may
    happen that in the same node, another chargroup happens to decrease
    it size by the same amount. In this case, the node may be seen as
    having not been modified. This is probably extremely rare. If on
    top of this, it happens that no other node has been modified, then
    the file may be seen as complete, and the discrepancy left as is
    in the file, leading to a broken file. The probability that this
    happens is abyssally low, but the bug exists, and the current debug
    code would not have caught this.
    To further catch similar bugs, this change also modifies the test
    that  decides if the node has changed. On grounds that all components
    of a node may only decrease in size with each successive pass, it's
    theoritically safe to assume that the same size means the node
    contents have not changed, but in case of a bug like the bug above
    where a component wrongly grows while another shrinks and both cancel
    each other out, the new code will catch this. Also, this change adds
    a check against the number of passses, to avoid infinite loops in
    case of a bug in the computation code.
    
    This change fixes this bug by updating the cached address of each
    chargroup as we go. This eliminates the discrepancy and fixes the
    bug.
    
    Bug: 6383103
    Change-Id: Ia3f450e22c87c4c193cea8ddb157aebd5f224f01
    a64a1a46
    History
    Fix a bug where a node size would be seen as increasing.
    Jean Chalard authored
    The core reason for this is quite shrewd. When a word is a bigram
    of itself, the corresponding chargroup will have a bigram referring
    to itself. When computing bigram offsets, we use cached addresses of
    chargroups, but we compute the size of the node as we go. Hence, a
    discrepancy may happen between the base offset as seen by the bigram
    (which uses the recomputed value) and the target offset (which uses
    the cached value).
    When this happens, the cached node address is too large. The relative
    offset is negative, which is expected, since it points to this very
    charnode whose start is a few bytes earlier. But since the cached
    address is too large, the offset is computed as smaller than it should
    be.
    On the next pass, the cache has been refreshed with the newly computed
    size and the seen offset is now correct (or at least, much closer to
    correct). The correct value is larger than the previously computed
    offset, which was too small. If it happens that it crosses the -255 or
    -65335 boundary, the address will be seen as needing 1 more byte than
    previously computed. If this is the only change in size of this node,
    the node will be seen as having a larger size than previously, which
    is unexpected. Debug code was catching this and crashing the program.
    
    So this case is very rare, but in an even rarer occurence, it may
    happen that in the same node, another chargroup happens to decrease
    it size by the same amount. In this case, the node may be seen as
    having not been modified. This is probably extremely rare. If on
    top of this, it happens that no other node has been modified, then
    the file may be seen as complete, and the discrepancy left as is
    in the file, leading to a broken file. The probability that this
    happens is abyssally low, but the bug exists, and the current debug
    code would not have caught this.
    To further catch similar bugs, this change also modifies the test
    that  decides if the node has changed. On grounds that all components
    of a node may only decrease in size with each successive pass, it's
    theoritically safe to assume that the same size means the node
    contents have not changed, but in case of a bug like the bug above
    where a component wrongly grows while another shrinks and both cancel
    each other out, the new code will catch this. Also, this change adds
    a check against the number of passses, to avoid infinite loops in
    case of a bug in the computation code.
    
    This change fixes this bug by updating the cached address of each
    chargroup as we go. This eliminates the discrepancy and fixes the
    bug.
    
    Bug: 6383103
    Change-Id: Ia3f450e22c87c4c193cea8ddb157aebd5f224f01