Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 0.7 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      Profiling shows that BufferedRandomAccessFile.length is eating up an unexpected amount of cpu. It believe it is the call through the "native barrier" that is causing it. We often call the length method via is end of file when reading data. We can add caching of the file length to avoid this call most times.

      1. CASSANDRA-950.patch
        3 kB
        Johan Oskarsson
      2. CASSANDRA-950.patch
        4 kB
        Johan Oskarsson

        Activity

        Hide
        Johan Oskarsson added a comment -

        With this patch a microbenchmark shows an order of magnitude less time spent in length(). The method no longer shows up in the profiler "cpu burn" tree, down from 23%. Added unit test for the method.

        Show
        Johan Oskarsson added a comment - With this patch a microbenchmark shows an order of magnitude less time spent in length(). The method no longer shows up in the profiler "cpu burn" tree, down from 23%. Added unit test for the method.
        Hide
        Johan Oskarsson added a comment -

        It's worth noting that with stress.py I don't actually see any increase in throughput on my machine.

        Show
        Johan Oskarsson added a comment - It's worth noting that with stress.py I don't actually see any increase in throughput on my machine.
        Hide
        Jonathan Ellis added a comment -

        I'd be more comfortable if we only cache the length on init for read-only BRAF modes (thus not having to add any code to the inner loop of our write path).

        Incidentally, I found this while googling, indicating that length() makes not one but three system calls per invocation. So this seems like a worthwhile optimization. http://blogs.oracle.com/charlesLamb/2008/09/javaiorandomaccessfilelength.html

        Show
        Jonathan Ellis added a comment - I'd be more comfortable if we only cache the length on init for read-only BRAF modes (thus not having to add any code to the inner loop of our write path). Incidentally, I found this while googling, indicating that length() makes not one but three system calls per invocation. So this seems like a worthwhile optimization. http://blogs.oracle.com/charlesLamb/2008/09/javaiorandomaccessfilelength.html
        Hide
        Jonathan Ellis added a comment -

        also: space between "if" and "(", please

        Show
        Jonathan Ellis added a comment - also: space between "if" and "(", please
        Hide
        Johan Oskarsson added a comment -

        Updated patch that caches the length on init for read only files.

        Updated if ( position. My brain doesn't even notice if there is a space between if and (. I think I'm used to and prefer not having a space before ( since that's the norm after method names.

        Show
        Johan Oskarsson added a comment - Updated patch that caches the length on init for read only files. Updated if ( position. My brain doesn't even notice if there is a space between if and (. I think I'm used to and prefer not having a space before ( since that's the norm after method names.
        Hide
        Jonathan Ellis added a comment -

        +1

        (if you could fix the whitespace here on commit that would be great – mixed tabs + spaces?)

        + else
        +

        { + // opened as read only, file length is cached + return fileLength; + }
        Show
        Jonathan Ellis added a comment - +1 (if you could fix the whitespace here on commit that would be great – mixed tabs + spaces?) + else + { + // opened as read only, file length is cached + return fileLength; + }
        Hide
        Johan Oskarsson added a comment -

        Committed to trunk.

        Show
        Johan Oskarsson added a comment - Committed to trunk.
        Hide
        Hudson added a comment -

        Integrated in Cassandra #400 (See http://hudson.zones.apache.org/hudson/job/Cassandra/400/)
        Cache BufferedRandomAccessFile.length result to avoid native call. Patch by johan, review by jbellis.

        Show
        Hudson added a comment - Integrated in Cassandra #400 (See http://hudson.zones.apache.org/hudson/job/Cassandra/400/ ) Cache BufferedRandomAccessFile.length result to avoid native call. Patch by johan, review by jbellis.

          People

          • Assignee:
            Johan Oskarsson
            Reporter:
            Johan Oskarsson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development