Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5480

Empty batch returning from HBase may cause SchemaChangeException even when data does not have different schema

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.12.0
    • Component/s: None
    • Labels:
      None

      Description

      The following repo was provided by Hao Zhu.

      1. Create a Hbase table with 4 regions

      create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
      put 'myhbase','a','cf1:col1','somedata'
      put 'myhbase','b','cf1:col2','somedata'
      put 'myhbase','c','cf2:col1','somedata'
      

      One region has cf1.col1. One region has column family 'cf1', but does not have 'col1' under 'cf1'. One region has only column family 'cf2'. And last region is complete empty.

      2. Prepare a csv file.

      select * from dfs.tmp.`joinhbase.csv`;
      +-------------------+
      |      columns      |
      +-------------------+
      | ["1","somedata"]  |
      | ["2","somedata"]  |
      | ["3","somedata"]  |
      

      Now run the following query on drill 1.11.0-SNAPSHOT:

      select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 'UTF8') as col1
      from 
      hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C
      ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1]
      ;
      

      The correct query result show be:

      +---------+-----------+
      | keyCol  |   col1    |
      +---------+-----------+
      | a       | somedata  |
      | a       | somedata  |
      | a       | somedata  |
      +---------+-----------+
      

      Turn off broadcast join, then we will see SchemaChangeException, or incorrect result randomly. By 'randomly', it means in the same session, the same query would hit SchemaChangeException in one run, while gets incorrect result in a second run.

      alter session set `planner.enable_broadcast_join`=false;
      
      select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 'UTF8') as col1
      . . . . . . . . . . . . . . . . . .> from
      . . . . . . . . . . . . . . . . . .> hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C
      . . . . . . . . . . . . . . . . . .> ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1]
      . . . . . . . . . . . . . . . . . .> ;
      Error: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema changes
      
      +---------+-------+
      | keyCol  | col1  |
      +---------+-------+
      +---------+-------+
      No rows selected (0.302 seconds)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jni Jinfeng Ni
                Reporter:
                jni Jinfeng Ni
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: