Description
This sbt/sbt hive/console session can easily reproduce this issue:
sql("SELECT * FROM src WHERE key % 2 = 0"). sample(withReplacement = false, fraction = 0.05). registerTempTable("sampled") println(table("sampled").queryExecution) val query = sql("SELECT * FROM sampled WHERE key % 2 = 1") println(query.queryExecution) // Should print `true' println((1 to 10).map(_ => query.collect().isEmpty).reduce(_ && _))
Notice that when fraction is less than 0.4, GapSamplingIterator is used to do the sampling. My guess is that there’s something to do with the underlying mutable row objects used in HiveTableScan, but haven't figured out the root cause.