Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
    • Environment:

      Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)

    • Release Note:
      Speed up RCFile::sync() by searching with a larger buffer window

      Description

      RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function.

      From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads.

      Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal & RemoteBlockReader classes.

      Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function >10x.

        Issue Links

          Activity

          Hide
          Gopal V added a comment -

          buffer RCFile::sync reads into "io.bytes.per.checksum" chunks

          Show
          Gopal V added a comment - buffer RCFile::sync reads into "io.bytes.per.checksum" chunks
          Hide
          Gopal V added a comment -
          split location before after
          store_sales/000000_0:67108864+67108864 748 ms 81 ms
          store_sales/000002_0:67108864+67108864 966 ms 54 ms
          store_sales/000004_0:67108864+67108864 948 ms 51 ms
          store_sales/000006_0:67108864+67108864 922 ms 42 ms
          store_sales/000008_0:67108864+67108864 842 ms 40 ms
          store_sales/000010_0:67108864+67108864 1302 ms 82 ms
          store_sales/000012_0:67108864+67108864 989 ms 50 ms
          store_sales/000014_0:67108864+67108864 970 ms 43 ms
          store_sales/000001_0:67108864+67108864 829 ms 47 ms
          store_sales/000003_0:67108864+67108864 811 ms 43 ms
          store_sales/000007_0:67108864+67108864 865 ms 51 ms
          store_sales/000005_0:67108864+67108864 1042 ms 59 ms
          store_sales/000009_0:67108864+67108864 902 ms 39 ms
          store_sales/000011_0:67108864+67108864 1046 ms 42 ms
          store_sales/000013_0:67108864+67108864 1048 ms 44 ms

          As expected, the function is faster by an order of magnitude & fast enough to not need more optimization in the inner sync.length for loop.

          Over all, the query was faster by 2+ seconds for a 28 second query (since we have 8 slots and 15 mappers, so that's expected).

          Show
          Gopal V added a comment - split location before after store_sales/000000_0:67108864+67108864 748 ms 81 ms store_sales/000002_0:67108864+67108864 966 ms 54 ms store_sales/000004_0:67108864+67108864 948 ms 51 ms store_sales/000006_0:67108864+67108864 922 ms 42 ms store_sales/000008_0:67108864+67108864 842 ms 40 ms store_sales/000010_0:67108864+67108864 1302 ms 82 ms store_sales/000012_0:67108864+67108864 989 ms 50 ms store_sales/000014_0:67108864+67108864 970 ms 43 ms store_sales/000001_0:67108864+67108864 829 ms 47 ms store_sales/000003_0:67108864+67108864 811 ms 43 ms store_sales/000007_0:67108864+67108864 865 ms 51 ms store_sales/000005_0:67108864+67108864 1042 ms 59 ms store_sales/000009_0:67108864+67108864 902 ms 39 ms store_sales/000011_0:67108864+67108864 1046 ms 42 ms store_sales/000013_0:67108864+67108864 1048 ms 44 ms As expected, the function is faster by an order of magnitude & fast enough to not need more optimization in the inner sync.length for loop. Over all, the query was faster by 2+ seconds for a 28 second query (since we have 8 slots and 15 mappers, so that's expected).
          Hide
          Gopal V added a comment -

          Original bug report for the sync() slow-down, with a partial fix (already in svn)

          Show
          Gopal V added a comment - Original bug report for the sync() slow-down, with a partial fix (already in svn)
          Hide
          Ashutosh Chauhan added a comment -

          +1 will commit if tests pass

          Show
          Ashutosh Chauhan added a comment - +1 will commit if tests pass
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Gopal!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Gopal!
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #2082 (See https://builds.apache.org/job/Hive-trunk-h0.21/2082/)
          HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) (Revision 1476648)

          Result = FAILURE
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476648
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #2082 (See https://builds.apache.org/job/Hive-trunk-h0.21/2082/ ) HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) (Revision 1476648) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476648 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #179 (See https://builds.apache.org/job/Hive-trunk-hadoop2/179/)
          HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) (Revision 1476648)

          Result = FAILURE
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476648
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #179 (See https://builds.apache.org/job/Hive-trunk-hadoop2/179/ ) HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) (Revision 1476648) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476648 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
          Hide
          tagus wang added a comment -

          this has a bug in this:
          System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix);
          it should be
          System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix);

          Show
          tagus wang added a comment - this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix);
          Hide
          Edward Capriolo added a comment -

          Maybe we can add a test when we fix.

          Show
          Edward Capriolo added a comment - Maybe we can add a test when we fix.
          Hide
          Gopal V added a comment -

          Good catch tagus wang, it is in fact missing 1 byte at the end.

          Please log a new bug & assign it to me - I will fix this and add an extra test-case for this off-by-one error.

          Show
          Gopal V added a comment - Good catch tagus wang , it is in fact missing 1 byte at the end. Please log a new bug & assign it to me - I will fix this and add an extra test-case for this off-by-one error.
          Hide
          tagus wang added a comment -

          Gopal V, i report it in HIVE-5100, but i cannot assign it to you.
          so you need to help yourself.

          Show
          tagus wang added a comment - Gopal V, i report it in HIVE-5100 , but i cannot assign it to you. so you need to help yourself.
          Hide
          Ashutosh Chauhan added a comment -

          This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

          Show
          Ashutosh Chauhan added a comment - This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

            People

            • Assignee:
              Gopal V
              Reporter:
              Gopal V
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development