Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Description : https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619
Root Cause: Hive is tracking getSplits calls by dataset basePath and does not take INputFormatClass into account. Hence getSplits() is called only once. In the case of RO and RT tables, they both have same dataset base-path but differ in the InputFormatClass. Due to this, Hive join query is returning weird results.
=============
The result of the demo is very strange
(Step 6(a))
{{ select `_hoodie_commit_time`, symbol, ts, volume, open, close from stock_ticks_mor_rt where symbol = 'GOOG';
select `_hoodie_commit_time`, symbol, ts, volume, open, close from stock_ticks_mor where symbol = 'GOOG';}}
return as demo
BUT!
{{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b on a.key=b.key where a.ts != b.ts
...
------------------+
a.key | a.ts | b.ts |
------------------+
------------------+}}
{{0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= 'GOOG_2018-08-31 10';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log
2019-07-18 09:13:20 Starting to launch local task to process map join; maximum memory = 477626368
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into file: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
2019-07-18 09:13:21 Uploaded 1 File to: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable (317 bytes)
2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec.
-------------------------------------------------------------+
a.key | a.ts | b.ts |
-------------------------------------------------------------+
GOOG_2018-08-31 10 | 2018-08-31 10:29:00 | 2018-08-31 10:29:00 |
-------------------------------------------------------------+
1 row selected (7.207 seconds)
0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from stock_ticks_mor a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 10';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/root/root_20190718091348_72a5fc30-fc04-41c1-b2e3-5f943e4d5c08.log
2019-07-18 09:13:51 Starting to launch local task to process map join; maximum memory = 477626368
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2019-07-18 09:13:53 Dump the side-table for tag: 0 with group count: 1 into file: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable
2019-07-18 09:13:53 Uploaded 1 File to: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable (317 bytes)
2019-07-18 09:13:53 End of local task; Time Taken: 2.36 sec.
-------------------------------------------------------------+
a.key | a.ts | b.ts |
-------------------------------------------------------------+
GOOG_2018-08-31 10 | 2018-08-31 10:59:00 | 2018-08-31 10:59:00 |
-------------------------------------------------------------+}}
Attachments
Issue Links
- is depended upon by
-
HUDI-901 Bug Bash 0.6.0 Tracking Ticket
- Resolved