[SPARK-16996] Hive ACID delta files not seen - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.5.2, 1.6.3, 2.1.2, 2.2.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed
Environment:

Hive 1.2.1, Spark 1.5.2

Description

spark-sql seems not to see data stored as delta files in an ACID Hive table.

Actually I encountered the same problem as describe here : http://stackoverflow.com/questions/35955666/spark-sql-is-not-returning-records-for-hive-transactional-tables-on-hdp

For example, create an ACID table with HiveCLI and insert a row :

set hive.support.concurrency=true;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on=true;
set hive.compactor.worker.threads=1;
 CREATE TABLE deltas(cle string,valeur string) CLUSTERED BY (cle) INTO 1 BUCKETS
    ROW FORMAT SERDE  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
    STORED AS 
      INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    TBLPROPERTIES ('transactional'='true');

INSERT INTO deltas VALUES("a","a");

Then make a query with spark-sql CLI :

SELECT * FROM deltas;

That query gets no result and there are no errors in logs.
If you go to HDFS to inspect table files, you find only deltas

~>hdfs dfs -ls /apps/hive/warehouse/deltas
Found 1 items
drwxr-x---   - me hdfs          0 2016-08-10 14:03 /apps/hive/warehouse/deltas/delta_0020943_0020943

Then if you run compaction on that table (in HiveCLI) :

ALTER TABLE deltas COMPACT 'MAJOR';

As a result, the delta will be compute into a base file :

~>hdfs dfs -ls /apps/hive/warehouse/deltas
Found 1 items
drwxrwxrwx   - me hdfs          0 2016-08-10 15:25 /apps/hive/warehouse/deltas/base_0020943

Go back to spark-sql and the same query gets a result :

SELECT * FROM deltas;
a       a
Time taken: 0.477 seconds, Fetched 1 row(s)

But next time you make an insert into Hive table :

INSERT INTO deltas VALUES("b","b");

spark-sql will immediately see changes :

SELECT * FROM deltas;
a       a
b       b
Time taken: 0.122 seconds, Fetched 2 row(s)

Yet there was no other compaction, but spark-sql "sees" the base AND the delta file :

~> hdfs dfs -ls /apps/hive/warehouse/deltas
Found 2 items
drwxrwxrwx   - valdata hdfs          0 2016-08-10 15:25 /apps/hive/warehouse/deltas/base_0020943
drwxr-x---   - valdata hdfs          0 2016-08-10 15:31 /apps/hive/warehouse/deltas/delta_0020956_0020956

Attachments

Issue Links

is blocked by

HIVE-15189 No base file for ACID table

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Benjamin BONNET

Votes:: 2 Vote for this issue

Watchers:: 19 Start watching this issue

Dates

Created:: 10/Aug/16 13:37

Updated:: 25/May/21 01:55

Resolved:: 25/May/21 01:39