Hive
  1. Hive
  2. HIVE-160

sampling in a subquery is broken

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    1. hive-160.1.patch
      256 kB
      Raghotham Murthy

      Activity

      Hide
      Namit Jain added a comment -

      Committed. Thanks Raghu

      Show
      Namit Jain added a comment - Committed. Thanks Raghu
      Hide
      Raghotham Murthy added a comment -

      Filed HIVE-638 to fix sampling in subqueries properly

      Show
      Raghotham Murthy added a comment - Filed HIVE-638 to fix sampling in subqueries properly
      Hide
      Namit Jain added a comment -

      +1

      The code changes look good - will commit if the tests look good and they pass

      Show
      Namit Jain added a comment - +1 The code changes look good - will commit if the tests look good and they pass
      Hide
      Raghotham Murthy added a comment -

      No, the problem is that input pruning does not work well when done over parse structures (QB). We should do it over the operator tree. The current patch is a temporary fix for this bug. It always adds a sampling predicate to the where clause irrespective of whether there was input pruning or not. The final fix will be modeled after the partition pruning code that Ashish is fixing.

      I also modified the tests so that srcbucket has an integer key. This allows for better testing of the case where a predicate is added to the where clause. 'Bucket 1 out of 2' will return keys which are even and bucket 2 out of 2 will return keys which are odd.

      Show
      Raghotham Murthy added a comment - No, the problem is that input pruning does not work well when done over parse structures (QB). We should do it over the operator tree. The current patch is a temporary fix for this bug. It always adds a sampling predicate to the where clause irrespective of whether there was input pruning or not. The final fix will be modeled after the partition pruning code that Ashish is fixing. I also modified the tests so that srcbucket has an integer key. This allows for better testing of the case where a predicate is added to the where clause. 'Bucket 1 out of 2' will return keys which are even and bucket 2 out of 2 will return keys which are odd.
      Hide
      Zheng Shao added a comment -

      So it's resolved right? Will you close this issue?

      Show
      Zheng Shao added a comment - So it's resolved right? Will you close this issue?
      Hide
      Raghotham Murthy added a comment -

      Sampling within a sub-query does not seem to prune the input. A filter is added and the result seems correct.

      Show
      Raghotham Murthy added a comment - Sampling within a sub-query does not seem to prune the input. A filter is added and the result seems correct.
      Hide
      Ashish Thusoo added a comment -

      I think this one has been resolved by Raghu or Namit?

      Show
      Ashish Thusoo added a comment - I think this one has been resolved by Raghu or Namit?
      Hide
      Jeff Hammerbacher added a comment -

      Adding to "Query Processor" component.

      Show
      Jeff Hammerbacher added a comment - Adding to "Query Processor" component.

        People

        • Assignee:
          Raghotham Murthy
          Reporter:
          Venky Iyer
        • Votes:
          0 Vote for this issue
          Watchers:
          2 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development