Hive
  1. Hive
  2. HIVE-5277

HBase handler skips rows with null valued first cells when only row key is selected

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.11.0, 0.11.1, 0.12.0, 0.13.0
    • Fix Version/s: 1.3.0, 2.0.0
    • Component/s: HBase Handler
    • Labels:
      None

      Description

      HBaseStorageHandler skips rows with null valued first cells when only row key is selected.

      SELECT key, col1, col2 FROM hbase_table;
      key1	cell1	cell2 
      key2	NULL	cell3
      
      SELECT COUNT(key) FROM hbase_table;
      1
      

      HiveHBaseTableInputFormat.getRecordReader makes first cell selected to avoid skipping rows. But when the first cell is null, HBase skips that row.

      http://hbase.apache.org/book/perf.reading.html 12.9.6. Optimal Loading of Row Keys describes how to deal with this problem.

      I tried to find an existing issue, but I couldn't. If you find a same issue, please make this issue duplicated.

      1. HIVE-5277.1.patch.txt
        7 kB
        Teddy Choi
      2. HIVE-5277.2.patch.txt
        7 kB
        Teddy Choi
      3. HIVE-5277.3.patch.txt
        8 kB
        Swarnim Kulkarni

        Issue Links

          Activity

          Hide
          Chao Sun added a comment -

          Committed to branch-1 and master. Thanks Xuefu for the review.

          Show
          Chao Sun added a comment - Committed to branch-1 and master. Thanks Xuefu for the review.
          Hide
          Swarnim Kulkarni added a comment -

          Xuefu Zhang Is there anything else you want me to look at with this one or we are good to merge this?

          Show
          Swarnim Kulkarni added a comment - Xuefu Zhang Is there anything else you want me to look at with this one or we are good to merge this?
          Hide
          Xuefu Zhang added a comment -

          +1

          Show
          Xuefu Zhang added a comment - +1
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12749196/HIVE-5277.3.patch.txt

          SUCCESS: +1 9345 tests passed

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4873/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4873/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4873/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12749196 - PreCommit-HIVE-TRUNK-Build

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749196/HIVE-5277.3.patch.txt SUCCESS: +1 9345 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4873/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4873/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4873/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12749196 - PreCommit-HIVE-TRUNK-Build
          Hide
          Swarnim Kulkarni added a comment -

          RB: https://reviews.apache.org/r/37207/

          Xuefu Zhang Ashutosh Chauhan Would you guys mind taking a quick look?

          Show
          Swarnim Kulkarni added a comment - RB: https://reviews.apache.org/r/37207/ Xuefu Zhang Ashutosh Chauhan Would you guys mind taking a quick look?
          Hide
          Swarnim Kulkarni added a comment -

          Updated patch to fix counts for count(key) and count

          Show
          Swarnim Kulkarni added a comment - Updated patch to fix counts for count(key) and count
          Hide
          Swarnim Kulkarni added a comment -

          Just to update, this is an issue with count type queries as well not only count(key).

          Show
          Swarnim Kulkarni added a comment - Just to update, this is an issue with count type queries as well not only count(key).
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12608162/HIVE-5277.2.patch.txt

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4812/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4812/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4812/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Tests exited with: NonZeroExitCodeException
          Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
          + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
          + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
          + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
          + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
          + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
          + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
          + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + cd /data/hive-ptest/working/
          + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4812/source-prep.txt
          + [[ false == \t\r\u\e ]]
          + mkdir -p maven ivy
          + [[ git = \s\v\n ]]
          + [[ git = \g\i\t ]]
          + [[ -z master ]]
          + [[ -d apache-github-source-source ]]
          + [[ ! -d apache-github-source-source/.git ]]
          + [[ ! -d apache-github-source-source ]]
          + cd apache-github-source-source
          + git fetch origin
          + git reset --hard HEAD
          HEAD is now at 5bb2506 HIVE-11434: Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors (reviewed by Chao)
          + git clean -f -d
          Removing pom.xml.orig
          + git checkout master
          Already on 'master'
          + git reset --hard origin/master
          HEAD is now at 5bb2506 HIVE-11434: Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors (reviewed by Chao)
          + git merge --ff-only origin/master
          Already up-to-date.
          + git gc
          + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
          + patchFilePath=/data/hive-ptest/working/scratch/build.patch
          + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
          + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
          + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
          The patch does not appear to apply with p0, p1, or p2
          + exit 1
          '
          

          This message is automatically generated.

          ATTACHMENT ID: 12608162 - PreCommit-HIVE-TRUNK-Build

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12608162/HIVE-5277.2.patch.txt Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4812/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4812/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4812/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4812/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 5bb2506 HIVE-11434: Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors (reviewed by Chao) + git clean -f -d Removing pom.xml.orig + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 5bb2506 HIVE-11434: Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors (reviewed by Chao) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12608162 - PreCommit-HIVE-TRUNK-Build
          Hide
          Swarnim Kulkarni added a comment -

          Seems like this patch would need more work with all the updates on the master that have happened since this was logged.I can take the task to make this update.

          Show
          Swarnim Kulkarni added a comment - Seems like this patch would need more work with all the updates on the master that have happened since this was logged.I can take the task to make this update.
          Hide
          Swarnim Kulkarni added a comment -

          Bumping the priority on this to critical as it can cause hive to show completely wrong counts with null columns.

          Show
          Swarnim Kulkarni added a comment - Bumping the priority on this to critical as it can cause hive to show completely wrong counts with null columns.
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12608162/HIVE-5277.2.patch.txt

          SUCCESS: +1 4397 tests passed

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1114/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1114/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12608162/HIVE-5277.2.patch.txt SUCCESS: +1 4397 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1114/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1114/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated.
          Hide
          Teddy Choi added a comment -

          The first patch returned only the row key column when all columns were selected. The second patch fixed this error. It passed all failed tests and hbase_null_cell.q test.

          Show
          Teddy Choi added a comment - The first patch returned only the row key column when all columns were selected. The second patch fixed this error. It passed all failed tests and hbase_null_cell.q test.
          Hide
          Teddy Choi added a comment -

          I reproduced those errors, and I'm fixing them.

          Show
          Teddy Choi added a comment - I reproduced those errors, and I'm fixing them.
          Hide
          Teddy Choi added a comment -

          The failed tests on the test server passed on my computer. I'll try again, but it seems like a false alarm.

          Show
          Teddy Choi added a comment - The failed tests on the test server passed on my computer. I'll try again, but it seems like a false alarm.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12607931/HIVE-5277.1.patch.txt

          ERROR: -1 due to 3 failed/errored test(s), 4393 tests executed
          Failed tests:

          org.apache.hive.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableProjectionReadMR
          org.apache.hive.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableReadMR
          org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.testPigFilterProjection
          

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1103/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1103/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 3 tests failed
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12607931/HIVE-5277.1.patch.txt ERROR: -1 due to 3 failed/errored test(s), 4393 tests executed Failed tests: org.apache.hive.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableProjectionReadMR org.apache.hive.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableReadMR org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.testPigFilterProjection Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1103/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1103/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 3 tests failed This message is automatically generated.
          Hide
          Teddy Choi added a comment -
          Show
          Teddy Choi added a comment - Review request at https://reviews.apache.org/r/14587/
          Hide
          Teddy Choi added a comment -

          Modified HiveHBaseTableInputFormat#getRecordReader to follow http://hbase.apache.org/book/perf.reading.html 12.9.6. Optimal Loading of Row Keys. Added a related test file.

          Now it returns correct results with this cases.

          It also will make faster SELECT COUNT(rowky) FROM HBASE_TABLE than other COUNT usages. So it is considerable to use it for COUNT(1) cases, too.

          Show
          Teddy Choi added a comment - Modified HiveHBaseTableInputFormat#getRecordReader to follow http://hbase.apache.org/book/perf.reading.html 12.9.6. Optimal Loading of Row Keys. Added a related test file. Now it returns correct results with this cases. It also will make faster SELECT COUNT(rowky) FROM HBASE_TABLE than other COUNT usages. So it is considerable to use it for COUNT(1) cases, too.

            People

            • Assignee:
              Swarnim Kulkarni
              Reporter:
              Teddy Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development