Solr
  1. Solr
  2. SOLR-2679

Optimize memory usage in JavaBinCodec.readStr()

    Details

      Description

      In Solr 3.3, JavaBinCodec.readStr() reads complete string into byte[] and then converts it into String. FastInputStream is already buffered, so this it not necessary and may take a lot of memory if data string is large.

      This patch replaces usage of big byte[] with usage of dis.readUnsignedByte() for getting characters directly from stream.

      Patch reduces memory usage and may slightly improve performance of reading text data in javabin-codec.

      JavaBinCodec.readStr() function already has unit test in TestJavaBinCodec, no additional unit test is required.

      1. SOLR-2679.patch
        2 kB
        Maxim Valyanskiy

        Activity

        Hide
        Maxim Valyanskiy added a comment -

        patch

        Show
        Maxim Valyanskiy added a comment - patch
        Hide
        Robert Muir added a comment -

        3.4 -> 3.5

        Show
        Robert Muir added a comment - 3.4 -> 3.5
        Hide
        Erick Erickson added a comment -

        Does anyone have an opinion on this? It seems to me that the tradeoff is that the current form of this code reads the bytes from a local array, where the proposed patch makes a method call for every byte that's read, which seems like it'd be slower since the dis.readFully method does an arrayCopy under the covers.

        I'll volunteer to apply or close this, it seems like a time/space tradeoff. And the underlying byte[] grows to a max size and gets re-used.

        Since the class that's used to read data is called FastInputStream, my straw-man proposal is that we do NOT apply this patch and I'll close the JIRA.

        Show
        Erick Erickson added a comment - Does anyone have an opinion on this? It seems to me that the tradeoff is that the current form of this code reads the bytes from a local array, where the proposed patch makes a method call for every byte that's read, which seems like it'd be slower since the dis.readFully method does an arrayCopy under the covers. I'll volunteer to apply or close this, it seems like a time/space tradeoff. And the underlying byte[] grows to a max size and gets re-used. Since the class that's used to read data is called FastInputStream, my straw-man proposal is that we do NOT apply this patch and I'll close the JIRA.
        Hide
        Yonik Seeley added a comment -

        I think the right approach to increased efficiency here is to expose the byte[] of the FastInputStream.
        It's not a super-simple approach since it needs to operate chunk-at-a-time and deal with multi-byte chars crossing chunks.
        The goal of eliminating the double-buffering seems fine though - I think we can leave this issue open.

        Show
        Yonik Seeley added a comment - I think the right approach to increased efficiency here is to expose the byte[] of the FastInputStream. It's not a super-simple approach since it needs to operate chunk-at-a-time and deal with multi-byte chars crossing chunks. The goal of eliminating the double-buffering seems fine though - I think we can leave this issue open.
        Hide
        Maxim Valyanskiy added a comment -

        I think that there is no performance degradation after apply of my patch - I hope that JIT compiler is smart enough to optimize that method call when we read from stream buffer. I can make a benchmark if you think that proof is required.

        My patch brings other change - it builds character array while reading data from (buffered) stream so it may be even faster when very large text field is read from network stream. Current version reads complete text field into byte array and then builds character array.

        Show
        Maxim Valyanskiy added a comment - I think that there is no performance degradation after apply of my patch - I hope that JIT compiler is smart enough to optimize that method call when we read from stream buffer. I can make a benchmark if you think that proof is required. My patch brings other change - it builds character array while reading data from (buffered) stream so it may be even faster when very large text field is read from network stream. Current version reads complete text field into byte array and then builds character array.
        Hide
        Hoss Man added a comment -

        Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

        email notification suppressed to prevent mass-spam
        psuedo-unique token identifying these issues: hoss20120321nofix36

        Show
        Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Uwe Schindler added a comment -

        Move issue to Solr 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Solr 4.9.

          People

          • Assignee:
            Unassigned
            Reporter:
            Maxim Valyanskiy
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development