Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-258

Hive Query engine not supporting join queries between RT and RO tables

    XMLWordPrintableJSON

Details

    Description

      Description : https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619

       

      Root Cause: Hive is tracking getSplits calls by dataset basePath and does not take INputFormatClass into account. Hence getSplits() is called only once. In the case of RO and RT tables, they both have same dataset base-path but differ in the InputFormatClass. Due to this, Hive join query is returning weird results.

       

      =============

      The result of the demo is very strange
      (Step 6(a))

       

      {{ select `_hoodie_commit_time`, symbol, ts, volume, open, close from stock_ticks_mor_rt where symbol = 'GOOG';
      select `_hoodie_commit_time`, symbol, ts, volume, open, close from stock_ticks_mor where symbol = 'GOOG';}}

      return as demo

      BUT!

       

      {{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b on a.key=b.key where a.ts != b.ts
      ...
      ------------------+

      a.key a.ts b.ts

      ------------------+
      ------------------+}}

       

      {{0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= 'GOOG_2018-08-31 10';
      WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
      Execution log at: /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log
      2019-07-18 09:13:20 Starting to launch local task to process map join; maximum memory = 477626368
      SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
      SLF4J: Defaulting to no-operation (NOP) logger implementation
      SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
      2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into file: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
      2019-07-18 09:13:21 Uploaded 1 File to: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable (317 bytes)
      2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec.
      -------------------------------------------------------------+

      a.key a.ts b.ts

      -------------------------------------------------------------+

      GOOG_2018-08-31 10 2018-08-31 10:29:00 2018-08-31 10:29:00

      -------------------------------------------------------------+
      1 row selected (7.207 seconds)
      0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from stock_ticks_mor a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 10';
      WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
      Execution log at: /tmp/root/root_20190718091348_72a5fc30-fc04-41c1-b2e3-5f943e4d5c08.log
      2019-07-18 09:13:51 Starting to launch local task to process map join; maximum memory = 477626368
      SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
      SLF4J: Defaulting to no-operation (NOP) logger implementation
      SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
      2019-07-18 09:13:53 Dump the side-table for tag: 0 with group count: 1 into file: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable
      2019-07-18 09:13:53 Uploaded 1 File to: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable (317 bytes)
      2019-07-18 09:13:53 End of local task; Time Taken: 2.36 sec.
      -------------------------------------------------------------+

      a.key a.ts b.ts

      -------------------------------------------------------------+

      GOOG_2018-08-31 10 2018-08-31 10:59:00 2018-08-31 10:59:00

      -------------------------------------------------------------+}}

      Attachments

        Issue Links

          Activity

            People

              nishith29 Nishith Agarwal
              vbalaji Balaji Varadarajan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: