Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2422

don't reuse byte[] in IndexInput/Output for read/writeString

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.3, 3.0.2, 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      IndexInput now holds a private "byte[] bytes", which it re-uses for reading strings. Likewise, IndexOutput holds a UTF8Result (which holds "byte[] bytes"), re-used for writing strings.

      These are both dangerous, since on reading or writing immense strings, we never free this storage.

      We don't use read/writeString in very perf sensitive parts of the code, so, I think we should not reuse the byte[] at all.

      I think this is likely the cause of the recent "IndexWriter and memory usage" thread, started by Ross Woolf on java-user@.

      1. LUCENE-2422.patch
        2 kB
        Michael McCandless

        Activity

        Hide
        shaie Shai Erera added a comment -

        Mike - this patch is against an old revision? I'm up to the latest and IndexInput/Output don't include any field, just abstract methods. This seems to be relevant to 3.0.1 (and before?) If so, where does this need to be fixed post 3.0.1?

        Show
        shaie Shai Erera added a comment - Mike - this patch is against an old revision? I'm up to the latest and IndexInput/Output don't include any field, just abstract methods. This seems to be relevant to 3.0.1 (and before?) If so, where does this need to be fixed post 3.0.1?
        Hide
        mikemccand Michael McCandless added a comment -

        Mike - this patch is against an old revision?

        Yes, sorry, the patch applies to 2.9.x. I think we should fix it in 2.9.x (and all branches after – 3.0, trunk).

        In trunk these reused byte[] have been moved to DataInput/Output.

        Show
        mikemccand Michael McCandless added a comment - Mike - this patch is against an old revision? Yes, sorry, the patch applies to 2.9.x. I think we should fix it in 2.9.x (and all branches after – 3.0, trunk). In trunk these reused byte[] have been moved to DataInput/Output.

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development