Description
When we use HCatalog as source and destination of data for Pig on Tez we get ERROR 1115: Schema for data cannot be determined.
Pig works fine when we use map reduce or use HCatalog only as one of endpoints i.e. load data directly from file and store using HCatalog.
The error appears after upgrading from Pig 0.14 on Tez 0.5.2 to Pig 0.15 on Tez 0.7.0 ( HDP 2.2.6 to HDP 2.3.2).
To reproduce:
- create hive tables from hive_tables.hql
- load data to table_input from sample.csv
- run following Pig script on Tez
data = LOAD 'table_input' USING org.apache.hive.hcatalog.pig.HCatLoader(); items_unique = DISTINCT data; counted = FOREACH (GROUP items_unique BY col2) GENERATE group AS name, COUNT(items_unique) AS value; STORE counted INTO 'table_output' USING org.apache.hive.hcatalog.pig.HCatStorer();