Lucene - Core
  1. Lucene - Core
  2. LUCENE-2422

don't reuse byte[] in IndexInput/Output for read/writeString

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.3, 3.0.2, 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      IndexInput now holds a private "byte[] bytes", which it re-uses for reading strings. Likewise, IndexOutput holds a UTF8Result (which holds "byte[] bytes"), re-used for writing strings.

      These are both dangerous, since on reading or writing immense strings, we never free this storage.

      We don't use read/writeString in very perf sensitive parts of the code, so, I think we should not reuse the byte[] at all.

      I think this is likely the cause of the recent "IndexWriter and memory usage" thread, started by Ross Woolf on java-user@.

      1. LUCENE-2422.patch
        2 kB
        Michael McCandless

        Activity

        Hide
        Shai Erera added a comment -

        Mike - this patch is against an old revision? I'm up to the latest and IndexInput/Output don't include any field, just abstract methods. This seems to be relevant to 3.0.1 (and before?) If so, where does this need to be fixed post 3.0.1?

        Show
        Shai Erera added a comment - Mike - this patch is against an old revision? I'm up to the latest and IndexInput/Output don't include any field, just abstract methods. This seems to be relevant to 3.0.1 (and before?) If so, where does this need to be fixed post 3.0.1?
        Hide
        Michael McCandless added a comment -

        Mike - this patch is against an old revision?

        Yes, sorry, the patch applies to 2.9.x. I think we should fix it in 2.9.x (and all branches after – 3.0, trunk).

        In trunk these reused byte[] have been moved to DataInput/Output.

        Show
        Michael McCandless added a comment - Mike - this patch is against an old revision? Yes, sorry, the patch applies to 2.9.x. I think we should fix it in 2.9.x (and all branches after – 3.0, trunk). In trunk these reused byte[] have been moved to DataInput/Output.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development