ashutoshc has requested changes to the revision "HIVE-2780 [jira] Implement more restrictive table sampler".
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:489 This config needs to be added to HiveConf.java and in hive-site.xml.template with description. Also, indicate that alternate sampler is available if someone wants to use it.
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:583 Instead of growing this file further, I think it will make sense to put this class in its own java file. Also, can you please also add comments on algorithm which this sampler follows.
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:620 Instead of growing this file further, I think it will make sense to put this class in its own java file. Also, can you please also add comments on algorithm which this sampler follows.
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:657 I assume split is splitable only if its either FileInputFormat or uncompressed TextInputFormat. Is that correct ? If so, I think it will be easier to read this logic if its written as follows:
if ( if instanceof FileIF || if instanceof mapreduce.FileIF || (if instanceof TextIF && !uncompressed))
ql/src/java/org/apache/hadoop/hive/ql/io/SplitSampler.java:34 Please document the contract of this interface.
ql/src/test/results/clientpositive/split_sample_sampler.q.out:27 Just because sampler is different, value of count( * ) should not change ? But you got 8 with HeadSampler, but 118 with Default Sampler ?
ql/src/test/results/clientpositive/split_sample_sampler.q.out:36 Just because sampler is different, value of count( * ) should not change ? But you got 8 with HeadSampler, but 20 with Default Sampler ? Also, default sampler generated same number 118 for both percent as well as Bytes, but Head sampler got different values. Whats the reason for that ?
To: JIRA, ashutoshc, navis