Hive
  1. Hive
  2. HIVE-5277

HBase handler skips rows with null valued first cells when only row key is selected

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.11.0, 0.11.1, 0.12.0, 0.13.0
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None

      Description

      HBaseStorageHandler skips rows with null valued first cells when only row key is selected.

      SELECT key, col1, col2 FROM hbase_table;
      key1	cell1	cell2 
      key2	NULL	cell3
      
      SELECT COUNT(key) FROM hbase_table;
      1
      

      HiveHBaseTableInputFormat.getRecordReader makes first cell selected to avoid skipping rows. But when the first cell is null, HBase skips that row.

      http://hbase.apache.org/book/perf.reading.html 12.9.6. Optimal Loading of Row Keys describes how to deal with this problem.

      I tried to find an existing issue, but I couldn't. If you find a same issue, please make this issue duplicated.

      1. HIVE-5277.2.patch.txt
        7 kB
        Teddy Choi
      2. HIVE-5277.1.patch.txt
        7 kB
        Teddy Choi

        Activity

        Hide
        Hive QA added a comment -

        Overall: -1 no tests executed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12608162/HIVE-5277.2.patch.txt

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4812/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4812/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4812/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Tests exited with: NonZeroExitCodeException
        Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
        + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
        + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
        + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
        + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
        + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
        + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
        + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
        + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
        + cd /data/hive-ptest/working/
        + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4812/source-prep.txt
        + [[ false == \t\r\u\e ]]
        + mkdir -p maven ivy
        + [[ git = \s\v\n ]]
        + [[ git = \g\i\t ]]
        + [[ -z master ]]
        + [[ -d apache-github-source-source ]]
        + [[ ! -d apache-github-source-source/.git ]]
        + [[ ! -d apache-github-source-source ]]
        + cd apache-github-source-source
        + git fetch origin
        + git reset --hard HEAD
        HEAD is now at 5bb2506 HIVE-11434: Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors (reviewed by Chao)
        + git clean -f -d
        Removing pom.xml.orig
        + git checkout master
        Already on 'master'
        + git reset --hard origin/master
        HEAD is now at 5bb2506 HIVE-11434: Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors (reviewed by Chao)
        + git merge --ff-only origin/master
        Already up-to-date.
        + git gc
        + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
        + patchFilePath=/data/hive-ptest/working/scratch/build.patch
        + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
        + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
        + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
        The patch does not appear to apply with p0, p1, or p2
        + exit 1
        '
        

        This message is automatically generated.

        ATTACHMENT ID: 12608162 - PreCommit-HIVE-TRUNK-Build

        Show
        Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12608162/HIVE-5277.2.patch.txt Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4812/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4812/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4812/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4812/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 5bb2506 HIVE-11434: Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors (reviewed by Chao) + git clean -f -d Removing pom.xml.orig + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 5bb2506 HIVE-11434: Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors (reviewed by Chao) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12608162 - PreCommit-HIVE-TRUNK-Build
        Hide
        Swarnim Kulkarni added a comment -

        Seems like this patch would need more work with all the updates on the master that have happened since this was logged.I can take the task to make this update.

        Show
        Swarnim Kulkarni added a comment - Seems like this patch would need more work with all the updates on the master that have happened since this was logged.I can take the task to make this update.
        Hide
        Swarnim Kulkarni added a comment -

        Bumping the priority on this to critical as it can cause hive to show completely wrong counts with null columns.

        Show
        Swarnim Kulkarni added a comment - Bumping the priority on this to critical as it can cause hive to show completely wrong counts with null columns.
        Hide
        Hive QA added a comment -

        Overall: +1 all checks pass

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12608162/HIVE-5277.2.patch.txt

        SUCCESS: +1 4397 tests passed

        Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1114/testReport
        Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1114/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        

        This message is automatically generated.

        Show
        Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12608162/HIVE-5277.2.patch.txt SUCCESS: +1 4397 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1114/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1114/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated.
        Hide
        Teddy Choi added a comment -

        The first patch returned only the row key column when all columns were selected. The second patch fixed this error. It passed all failed tests and hbase_null_cell.q test.

        Show
        Teddy Choi added a comment - The first patch returned only the row key column when all columns were selected. The second patch fixed this error. It passed all failed tests and hbase_null_cell.q test.
        Hide
        Teddy Choi added a comment -

        I reproduced those errors, and I'm fixing them.

        Show
        Teddy Choi added a comment - I reproduced those errors, and I'm fixing them.
        Hide
        Teddy Choi added a comment -

        The failed tests on the test server passed on my computer. I'll try again, but it seems like a false alarm.

        Show
        Teddy Choi added a comment - The failed tests on the test server passed on my computer. I'll try again, but it seems like a false alarm.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12607931/HIVE-5277.1.patch.txt

        ERROR: -1 due to 3 failed/errored test(s), 4393 tests executed
        Failed tests:

        org.apache.hive.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableProjectionReadMR
        org.apache.hive.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableReadMR
        org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.testPigFilterProjection
        

        Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1103/testReport
        Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1103/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests failed with: TestsFailedException: 3 tests failed
        

        This message is automatically generated.

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12607931/HIVE-5277.1.patch.txt ERROR: -1 due to 3 failed/errored test(s), 4393 tests executed Failed tests: org.apache.hive.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableProjectionReadMR org.apache.hive.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableReadMR org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.testPigFilterProjection Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1103/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1103/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 3 tests failed This message is automatically generated.
        Hide
        Teddy Choi added a comment -
        Show
        Teddy Choi added a comment - Review request at https://reviews.apache.org/r/14587/
        Hide
        Teddy Choi added a comment -

        Modified HiveHBaseTableInputFormat#getRecordReader to follow http://hbase.apache.org/book/perf.reading.html 12.9.6. Optimal Loading of Row Keys. Added a related test file.

        Now it returns correct results with this cases.

        It also will make faster SELECT COUNT(rowky) FROM HBASE_TABLE than other COUNT usages. So it is considerable to use it for COUNT(1) cases, too.

        Show
        Teddy Choi added a comment - Modified HiveHBaseTableInputFormat#getRecordReader to follow http://hbase.apache.org/book/perf.reading.html 12.9.6. Optimal Loading of Row Keys. Added a related test file. Now it returns correct results with this cases. It also will make faster SELECT COUNT(rowky) FROM HBASE_TABLE than other COUNT usages. So it is considerable to use it for COUNT(1) cases, too.

          People

          • Assignee:
            Swarnim Kulkarni
            Reporter:
            Teddy Choi
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development