Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.14.0, 1.2.0, 1.2.1
-
None
-
None
Description
1, In fact,SybolicTextInputFormat supports the path with regex .I add some test sql .
2, But ,when using CombineHiveInputFormat to combine input files , It cannot resolve the path with regex ,so it will get a wrong result.I give a example ,and fix the problem.
Table desc :
CREATE External TABLE `symlink_text_input_format`( `key` string, `value` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'
There is a link file in the dir '/user/hive/warehouse/symlink_text_input_format' , the content of the link file is
viewfs://nsx/tmp/symlink*
it contains one path ,and the path contains a regex!
Execute the sql :
set hive.rework.mapredwork = true ; set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; set mapred.min.split.size.per.rack= 0 ; set mapred.min.split.size.per.node= 0 ; set mapred.max.split.size= 0 ; select count(*) from symlink_text_input_format ;
It will get a wrong result :0