Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 0.7 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      Profiling shows that BufferedRandomAccessFile.length is eating up an unexpected amount of cpu. It believe it is the call through the "native barrier" that is causing it. We often call the length method via is end of file when reading data. We can add caching of the file length to avoid this call most times.

      1. CASSANDRA-950.patch
        4 kB
        Johan Oskarsson
      2. CASSANDRA-950.patch
        3 kB
        Johan Oskarsson

        Activity

        Johan Oskarsson created issue -
        Hide
        Johan Oskarsson added a comment -

        With this patch a microbenchmark shows an order of magnitude less time spent in length(). The method no longer shows up in the profiler "cpu burn" tree, down from 23%. Added unit test for the method.

        Show
        Johan Oskarsson added a comment - With this patch a microbenchmark shows an order of magnitude less time spent in length(). The method no longer shows up in the profiler "cpu burn" tree, down from 23%. Added unit test for the method.
        Johan Oskarsson made changes -
        Field Original Value New Value
        Attachment CASSANDRA-950.patch [ 12440688 ]
        Johan Oskarsson made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Johan Oskarsson added a comment -

        It's worth noting that with stress.py I don't actually see any increase in throughput on my machine.

        Show
        Johan Oskarsson added a comment - It's worth noting that with stress.py I don't actually see any increase in throughput on my machine.
        Hide
        Jonathan Ellis added a comment -

        I'd be more comfortable if we only cache the length on init for read-only BRAF modes (thus not having to add any code to the inner loop of our write path).

        Incidentally, I found this while googling, indicating that length() makes not one but three system calls per invocation. So this seems like a worthwhile optimization. http://blogs.oracle.com/charlesLamb/2008/09/javaiorandomaccessfilelength.html

        Show
        Jonathan Ellis added a comment - I'd be more comfortable if we only cache the length on init for read-only BRAF modes (thus not having to add any code to the inner loop of our write path). Incidentally, I found this while googling, indicating that length() makes not one but three system calls per invocation. So this seems like a worthwhile optimization. http://blogs.oracle.com/charlesLamb/2008/09/javaiorandomaccessfilelength.html
        Hide
        Jonathan Ellis added a comment -

        also: space between "if" and "(", please

        Show
        Jonathan Ellis added a comment - also: space between "if" and "(", please
        Hide
        Johan Oskarsson added a comment -

        Updated patch that caches the length on init for read only files.

        Updated if ( position. My brain doesn't even notice if there is a space between if and (. I think I'm used to and prefer not having a space before ( since that's the norm after method names.

        Show
        Johan Oskarsson added a comment - Updated patch that caches the length on init for read only files. Updated if ( position. My brain doesn't even notice if there is a space between if and (. I think I'm used to and prefer not having a space before ( since that's the norm after method names.
        Johan Oskarsson made changes -
        Attachment CASSANDRA-950.patch [ 12440878 ]
        Hide
        Jonathan Ellis added a comment -

        +1

        (if you could fix the whitespace here on commit that would be great – mixed tabs + spaces?)

        + else
        +

        { + // opened as read only, file length is cached + return fileLength; + }
        Show
        Jonathan Ellis added a comment - +1 (if you could fix the whitespace here on commit that would be great – mixed tabs + spaces?) + else + { + // opened as read only, file length is cached + return fileLength; + }
        Hide
        Johan Oskarsson added a comment -

        Committed to trunk.

        Show
        Johan Oskarsson added a comment - Committed to trunk.
        Johan Oskarsson made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Assignee Johan Oskarsson [ johanoskarsson ]
        Fix Version/s 0.7 [ 12314533 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Cassandra #400 (See http://hudson.zones.apache.org/hudson/job/Cassandra/400/)
        Cache BufferedRandomAccessFile.length result to avoid native call. Patch by johan, review by jbellis.

        Show
        Hudson added a comment - Integrated in Cassandra #400 (See http://hudson.zones.apache.org/hudson/job/Cassandra/400/ ) Cache BufferedRandomAccessFile.length result to avoid native call. Patch by johan, review by jbellis.
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12507425 ] patch-available, re-open possible [ 12750591 ]
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12750591 ] reopen-resolved, no closed status, patch-avail, testing [ 12756603 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        3m 19s 1 Johan Oskarsson 03/Apr/10 13:16
        Patch Available Patch Available Resolved Resolved
        3d 1h 20m 1 Johan Oskarsson 06/Apr/10 14:37

          People

          • Assignee:
            Johan Oskarsson
            Reporter:
            Johan Oskarsson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development