Harmony
  1. Harmony
  2. HARMONY-6470

[classlib][luni] Scanner will exhaust all heap memory when parse a large file

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0M15
    • Fix Version/s: 5.0M15
    • Component/s: Classlib
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Please see the attached file

      1. HARMONY-6470.diff
        9 kB
        deven you
      2. HARMONY-6470_v3.diff
        9 kB
        deven you
      3. HARMONY-6470_v2.diff
        9 kB
        deven you

        Activity

        Hide
        deven you added a comment -

        This jira has been verified, Thanks!

        Show
        deven you added a comment - This jira has been verified, Thanks!
        Hide
        Hudson added a comment -

        Integrated in Harmony-select-1.5-head-linux-x86_64 #35 (See http://hudson.zones.apache.org/hudson/job/Harmony-select-1.5-head-linux-x86_64/35/)
        Apply patch for HARMONY-6470 ([classlib][luni] Scanner will exhaust all heap memory when parse a large file)

        Includes a (relatively) long running test case.

        Show
        Hudson added a comment - Integrated in Harmony-select-1.5-head-linux-x86_64 #35 (See http://hudson.zones.apache.org/hudson/job/Harmony-select-1.5-head-linux-x86_64/35/ ) Apply patch for HARMONY-6470 ( [classlib] [luni] Scanner will exhaust all heap memory when parse a large file) Includes a (relatively) long running test case.
        Hide
        Hudson added a comment -

        Integrated in Harmony-1.5-head-linux-x86_64 #855 (See http://hudson.zones.apache.org/hudson/job/Harmony-1.5-head-linux-x86_64/855/)
        Apply patch for HARMONY-6470 ([classlib][luni] Scanner will exhaust all heap memory when parse a large file)

        Includes a (relatively) long running test case.

        Show
        Hudson added a comment - Integrated in Harmony-1.5-head-linux-x86_64 #855 (See http://hudson.zones.apache.org/hudson/job/Harmony-1.5-head-linux-x86_64/855/ ) Apply patch for HARMONY-6470 ( [classlib] [luni] Scanner will exhaust all heap memory when parse a large file) Includes a (relatively) long running test case.
        Hide
        Tim Ellison added a comment -

        Thanks Deven,

        Patch applied to LUNI module at repo revision r954817.

        Please check it was applied as you expected.

        Show
        Tim Ellison added a comment - Thanks Deven, Patch applied to LUNI module at repo revision r954817. Please check it was applied as you expected.
        Hide
        Mark Hindess added a comment -

        Deferring until next release.

        Show
        Mark Hindess added a comment - Deferring until next release.
        Hide
        deven you added a comment -

        Since this path is a little complex, let me give some some explains.

        The main modification is in Scanner.expandBuffer(), originally it simply uses a new larger char buffer to replace the old one, and copy all from the old one, so all properties (charbuffer length, capacity, limit and Scanner's findStartIndex, preStartIndex, cacheHasNextIndex and etc. )related to the old buffer won't change.

        However in this patch, Scanner will try to discard the contends which are not used any longer in the current char buffer(which also may be replaced by a larger one) and only keep that will used for next Scanner.hasNext() or Scanner.next(), so the properties related to the char buffer also need be changed.

        The next point is this patch add 2 instance variables tokenStart and tokenEnd which were just 2 local vars in Scanner.setTokenRegion(), since the char buffer's properties now may be changed at any time, now methods(like findPreDelimiter() and findPostDelimiter() etc.) which track the start index and end index need change this 2 vars at any moment.

        if any one would like to try this patch? Thanks a lot!

        Show
        deven you added a comment - Since this path is a little complex, let me give some some explains. The main modification is in Scanner.expandBuffer(), originally it simply uses a new larger char buffer to replace the old one, and copy all from the old one, so all properties (charbuffer length, capacity, limit and Scanner's findStartIndex, preStartIndex, cacheHasNextIndex and etc. )related to the old buffer won't change. However in this patch, Scanner will try to discard the contends which are not used any longer in the current char buffer(which also may be replaced by a larger one) and only keep that will used for next Scanner.hasNext() or Scanner.next(), so the properties related to the char buffer also need be changed. The next point is this patch add 2 instance variables tokenStart and tokenEnd which were just 2 local vars in Scanner.setTokenRegion(), since the char buffer's properties now may be changed at any time, now methods(like findPreDelimiter() and findPostDelimiter() etc.) which track the start index and end index need change this 2 vars at any moment. if any one would like to try this patch? Thanks a lot!
        Hide
        deven you added a comment -

        Hi Tim,
        Following your comments, I have modified the test case and now it can only use memory to simulate file reading. Please check it. Thanks a lot!

        Show
        deven you added a comment - Hi Tim, Following your comments, I have modified the test case and now it can only use memory to simulate file reading. Please check it. Thanks a lot!
        Hide
        Tim Ellison added a comment -

        Sorry Deven, of course if you want the Reader to reach an end, you'll have to keep track of "char's served" too, and return -1 when you reach the limit – I'll leave that as an exercise

        Show
        Tim Ellison added a comment - Sorry Deven, of course if you want the Reader to reach an end, you'll have to keep track of "char's served" too, and return -1 when you reach the limit – I'll leave that as an exercise
        Hide
        Tim Ellison added a comment -

        Deven,

        Thanks for the patch.

        Rather than create a temp file of 8 << 21 (16777216) * 11 bytes and then reading it back into the scanner, do you think it would be better to have a Reader that returns the bytes in memory? I think the test will be more robust and faster that way.

        Something like this:

        public class MyReader extends Reader {

        static final char[] CONTENT = "Large file!\n".toCharArray();

        @Override
        public void close() throws IOException {
        }

        @Override
        public int read(char[] buf, int offset, int count) throws IOException {

        for (int i = offset, j = 0; j < count; i+, j+)

        { buf[i] = CONTENT[j % CONTENT.length]; }

        return count;
        }
        }

        Show
        Tim Ellison added a comment - Deven, Thanks for the patch. Rather than create a temp file of 8 << 21 (16777216) * 11 bytes and then reading it back into the scanner, do you think it would be better to have a Reader that returns the bytes in memory? I think the test will be more robust and faster that way. Something like this: public class MyReader extends Reader { static final char[] CONTENT = "Large file!\n".toCharArray(); @Override public void close() throws IOException { } @Override public int read(char[] buf, int offset, int count) throws IOException { for (int i = offset, j = 0; j < count; i+ , j +) { buf[i] = CONTENT[j % CONTENT.length]; } return count; } }

          People

          • Assignee:
            Tim Ellison
            Reporter:
            deven you
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 96h
              96h
              Remaining:
              Remaining Estimate - 96h
              96h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development