HBase
  1. HBase
  2. HBASE-5640

bulk load runs slowly than before

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      I am loading data from an external system into hbase. There are many prints of the form. This is possibly a regression caused by a recent patch.

      ....on different filesystem than destination store - moving to this filesystem

      1. bulkLoadFs2.txt
        1 kB
        dhruba borthakur
      2. bulkLoadFs1.txt
        2 kB
        dhruba borthakur

        Issue Links

          Activity

          Hide
          Laxman added a comment -

          There are many prints of the form. This is possibly a regression caused by a recent patch.

          ....on different filesystem than destination store - moving to this filesystem

          @Dhruba, can you please provide more details?

          Show
          Laxman added a comment - There are many prints of the form. This is possibly a regression caused by a recent patch. ....on different filesystem than destination store - moving to this filesystem @Dhruba, can you please provide more details?
          Hide
          dhruba borthakur added a comment -

          This is the fix I have in mind but have not yet tested in great detail.

          Show
          dhruba borthakur added a comment - This is the fix I have in mind but have not yet tested in great detail.
          Hide
          dhruba borthakur added a comment -

          It is better to compare the URIs that to use object equality. The object equality does not work because one object is of type FileSystem while the other object is a HFileSystem.

          Show
          dhruba borthakur added a comment - It is better to compare the URIs that to use object equality. The object equality does not work because one object is of type FileSystem while the other object is a HFileSystem.
          Hide
          Ted Yu added a comment -

          Patch v2 makes sense.

          Show
          Ted Yu added a comment - Patch v2 makes sense.
          Hide
          stack added a comment -

          Patch looks fine. It makes the bulk load faster making this change?

          Show
          stack added a comment - Patch looks fine. It makes the bulk load faster making this change?
          Hide
          dhruba borthakur added a comment -

          The code is such that if the filesystem object do not match, then the files is copied to a tmp location before uploading to the regionserver. With the checksum patch, the filesystem objects do not match (one is a FileSystm while the other is a HFileSystem).

          Show
          dhruba borthakur added a comment - The code is such that if the filesystem object do not match, then the files is copied to a tmp location before uploading to the regionserver. With the checksum patch, the filesystem objects do not match (one is a FileSystm while the other is a HFileSystem).
          Hide
          Nate Putnam added a comment -

          We had an issue with this a while back. The impact was that things would timeout because of the intermediate copy.

          Sat Jun 30 13:50:25 EDT 2012, org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3@2afd97ea, java.net.SocketTimeoutException: Call to proc22.prod.urbanairship.com/10.128.12.38:7040 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.128.12.40:45456 remote=proc22.prod.urbanairship.com/10.128.12.38:7040]

          at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:183)
          at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:491)

          Applying this patch fixed the issue and sped things up significantly.

          Show
          Nate Putnam added a comment - We had an issue with this a while back. The impact was that things would timeout because of the intermediate copy. Sat Jun 30 13:50:25 EDT 2012, org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3@2afd97ea, java.net.SocketTimeoutException: Call to proc22.prod.urbanairship.com/10.128.12.38:7040 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel [connected local=/10.128.12.40:45456 remote=proc22.prod.urbanairship.com/10.128.12.38:7040] at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:183) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:491) Applying this patch fixed the issue and sped things up significantly.
          Hide
          Anoop Sam John added a comment -

          This issue is same as HBASE-6529 which is already committed.
          Mind closing it as duplicate?

          Show
          Anoop Sam John added a comment - This issue is same as HBASE-6529 which is already committed. Mind closing it as duplicate?
          Hide
          stack added a comment -

          Resolving dup of HBASE-6529 (Thanks Anoop Sam John for pointing out the dup).

          Show
          stack added a comment - Resolving dup of HBASE-6529 (Thanks Anoop Sam John for pointing out the dup).

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development