Skip to content
Snippets Groups Projects
  1. Sep 25, 2012
  2. Sep 24, 2012
  3. Sep 21, 2012
  4. Sep 20, 2012
  5. Sep 19, 2012
  6. Sep 18, 2012
  7. Sep 14, 2012
  8. Sep 13, 2012
  9. Sep 12, 2012
  10. Sep 10, 2012
  11. Sep 05, 2012
  12. Aug 31, 2012
  13. Aug 30, 2012
  14. Aug 29, 2012
  15. Aug 24, 2012
  16. Aug 20, 2012
  17. Aug 17, 2012
  18. Jun 08, 2012
  19. May 25, 2012
  20. May 15, 2012
  21. May 14, 2012
    • Jean Chalard's avatar
      Small optimization · 76319c69
      Jean Chalard authored
      Performance gain is < 2%
      
      Bug: 6394357
      Change-Id: I2b7da946788cf11d1a491efd20fb2bd2333c23d1
      76319c69
    • Jean Chalard's avatar
      Small optimizations · 4df5b43d
      Jean Chalard authored
      Bug: 6394357
      Change-Id: I00ba1b5ab3d527b3768e28090c758ddd1629f281
      4df5b43d
    • Jean Chalard's avatar
      More optimizations · 3b1b72ac
      Jean Chalard authored
      We don't merge tails anyway, and we can't do it any more
      because that would break the bigram lookup algorithm.
      The speedup is about 20%, and possibly double this if
      there are no bigrams.
      
      Bug: 6394357
      
      Change-Id: I9eec11dda9000451706d280f120404a2acbea304
      3b1b72ac
  22. May 11, 2012
    • Jean Chalard's avatar
      Write the bigram frequency following the new formula · f7346de9
      Jean Chalard authored
      This also tests for bigram frequency against unigram frequency
      
      Bug: 6313806
      Bug: 6028348
      Change-Id: If7faa3559fee9f2496890f0bc0e081279e100854
      f7346de9
    • Jean Chalard's avatar
      Refactor a method · 4455fe2c
      Jean Chalard authored
      Rename it, rename parameters, and add a parameter that will
      be necessary soon.
      Also, rescale the bigram frequency as necessary.
      
      Bug: 6313806
      Change-Id: I192543cfb6ab6bccda4a1a53c8e67fbf50a257b0
      4455fe2c
  23. Apr 26, 2012
  24. Apr 24, 2012
    • Jean Chalard's avatar
      Fix binary reading code performance. · 1d80a7f3
      Jean Chalard authored
      This is not the Right fix ; the Right fix would be to read
      the file in a buffered way. However this delivers tolerable
      performance for a minimal amount of code changes.
      We may want to skip submitting this patch, but keep it around
      in case we need to use the functionality until we have a good
      patch.
      
      Change-Id: I1ba938f82acfd9436c3701d1078ff981afdbea60
      1d80a7f3
    • Jean Chalard's avatar
      Fix a bug where a node size would be seen as increasing. · a64a1a46
      Jean Chalard authored
      The core reason for this is quite shrewd. When a word is a bigram
      of itself, the corresponding chargroup will have a bigram referring
      to itself. When computing bigram offsets, we use cached addresses of
      chargroups, but we compute the size of the node as we go. Hence, a
      discrepancy may happen between the base offset as seen by the bigram
      (which uses the recomputed value) and the target offset (which uses
      the cached value).
      When this happens, the cached node address is too large. The relative
      offset is negative, which is expected, since it points to this very
      charnode whose start is a few bytes earlier. But since the cached
      address is too large, the offset is computed as smaller than it should
      be.
      On the next pass, the cache has been refreshed with the newly computed
      size and the seen offset is now correct (or at least, much closer to
      correct). The correct value is larger than the previously computed
      offset, which was too small. If it happens that it crosses the -255 or
      -65335 boundary, the address will be seen as needing 1 more byte than
      previously computed. If this is the only change in size of this node,
      the node will be seen as having a larger size than previously, which
      is unexpected. Debug code was catching this and crashing the program.
      
      So this case is very rare, but in an even rarer occurence, it may
      happen that in the same node, another chargroup happens to decrease
      it size by the same amount. In this case, the node may be seen as
      having not been modified. This is probably extremely rare. If on
      top of this, it happens that no other node has been modified, then
      the file may be seen as complete, and the discrepancy left as is
      in the file, leading to a broken file. The probability that this
      happens is abyssally low, but the bug exists, and the current debug
      code would not have caught this.
      To further catch similar bugs, this change also modifies the test
      that  decides if the node has changed. On grounds that all components
      of a node may only decrease in size with each successive pass, it's
      theoritically safe to assume that the same size means the node
      contents have not changed, but in case of a bug like the bug above
      where a component wrongly grows while another shrinks and both cancel
      each other out, the new code will catch this. Also, this change adds
      a check against the number of passses, to avoid infinite loops in
      case of a bug in the computation code.
      
      This change fixes this bug by updating the cached address of each
      chargroup as we go. This eliminates the discrepancy and fixes the
      bug.
      
      Bug: 6383103
      Change-Id: Ia3f450e22c87c4c193cea8ddb157aebd5f224f01
      a64a1a46
Loading