Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-529

IBM420 charset detection's isLamAlef is allocation-happy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.8
    • 1.1
    • parser
    • None

    Description

      Two IBM420 charset detectors (rtl and ltr) run isLamAlef() for each byte of detection buffer.

      The code is allocating and filling a bytes array every time it runs, which makes it responsible for approximately 70% of all object allocations in my current test case (many text files).

      Since array is identical every time, and the entire thing can be achieved without any array, this is wasteful.

      Attachments

        1. isLamAlef.diff
          0.7 kB
          Radek

        Activity

          People

            mikemccand Michael McCandless
            syskin Radek
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: