Details

    1. HIVE-1402.1.patch.txt
      40 kB
      Navis
    2. HIVE-1402.D8895.1.patch
      37 kB
      Phabricator
    3. HIVE-1402.D8895.2.patch
      39 kB
      Phabricator
    4. HIVE-1402.D8895.3.patch
      39 kB
      Phabricator
    5. HIVE-1402.D8895.4.patch
      37 kB
      Phabricator
    6. hive-1402.patch.6.txt
      38 kB
      Edward Capriolo

      Issue Links

        Activity

        Hide
        Jeff Hammerbacher added a comment -

        From Ning Zhang:

        order by is supported in trunk with certain limititions in strict mode (has to have a limit)

        From John Sichi:

        > If someone is interested in adding parallel ORDER BY to Hive (using TotalOrderPartitioner), here's a good starting point: http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad
        >
        > The goal would be to take that manual two-step sample-then-sort process and turn it into an automatic plan within Hive. I have a better example for the sampling query which I > haven't published yet.
        >
        > We would also need to name the final output files in such a way that the total order could be iterated via the filenames.

        Show
        Jeff Hammerbacher added a comment - From Ning Zhang: order by is supported in trunk with certain limititions in strict mode (has to have a limit) From John Sichi: > If someone is interested in adding parallel ORDER BY to Hive (using TotalOrderPartitioner), here's a good starting point: http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad > > The goal would be to take that manual two-step sample-then-sort process and turn it into an automatic plan within Hive. I have a better example for the sampling query which I > haven't published yet. > > We would also need to name the final output files in such a way that the total order could be iterated via the filenames.
        Hide
        Jeff Zhang added a comment -

        Hi, I make a draft implementation for one special case. And it works, but since it is only for one special case, so I have some hard coding. I hope someone can give some help or instruction for the next step.
        One big problem of parallel ORDER BY is that the output key type of ExecMapper is HiveKey, and it has been serialized by LazyBinarySerDe, so the original column type is lost here. But when do sampling and partition, I should use the original column type.

        The following is my initial design.

        1. During parse stage, extract one SampleOperator which has two children: TableScanOperator, SelectOperator ( I am not familiar with Hive Parse Stage, and the code is not clear for me, could anyone give some help or recommend some documentation about the Hive parser ? )

        2. Modify the TotalOrderPartitioner. Add a Deserializer to convert the HiveKey to its original column type. and deserialie the HiveKey in method getPartition().

        Welcome any comments and help.

        Show
        Jeff Zhang added a comment - Hi, I make a draft implementation for one special case. And it works, but since it is only for one special case, so I have some hard coding. I hope someone can give some help or instruction for the next step. One big problem of parallel ORDER BY is that the output key type of ExecMapper is HiveKey, and it has been serialized by LazyBinarySerDe, so the original column type is lost here. But when do sampling and partition, I should use the original column type. The following is my initial design. 1. During parse stage, extract one SampleOperator which has two children: TableScanOperator, SelectOperator ( I am not familiar with Hive Parse Stage, and the code is not clear for me, could anyone give some help or recommend some documentation about the Hive parser ? ) 2. Modify the TotalOrderPartitioner. Add a Deserializer to convert the HiveKey to its original column type. and deserialie the HiveKey in method getPartition(). Welcome any comments and help.
        Hide
        Phabricator added a comment -

        navis requested code review of "HIVE-1402 [jira] Add parallel ORDER BY to Hive".

        Reviewers: JIRA

        HIVE-1402 Add parallel ORDER BY to Hive

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D8895

        AFFECTED FILES
        common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalContext.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SamplingOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
        ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java
        ql/src/test/queries/clientpositive/parallel_orderby.q
        ql/src/test/results/clientpositive/parallel_orderby.q.out

        MANAGE HERALD RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/21627/

        To: JIRA, navis

        Show
        Phabricator added a comment - navis requested code review of " HIVE-1402 [jira] Add parallel ORDER BY to Hive". Reviewers: JIRA HIVE-1402 Add parallel ORDER BY to Hive TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8895 AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalContext.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SamplingOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinResolver.java ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java ql/src/test/queries/clientpositive/parallel_orderby.q ql/src/test/results/clientpositive/parallel_orderby.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/21627/ To: JIRA, navis
        Hide
        Edward Capriolo added a comment -

        Navis I clicked submit patch for you. Is this clean and ready to go? This is a feature hive has gone way to long without.

        Show
        Edward Capriolo added a comment - Navis I clicked submit patch for you. Is this clean and ready to go? This is a feature hive has gone way to long without.
        Hide
        Navis added a comment -

        Edward Capriolo Sadly it's not. I should do some works on this.

        Show
        Navis added a comment - Edward Capriolo Sadly it's not. I should do some works on this.
        Hide
        Phabricator added a comment -

        navis updated the revision "HIVE-1402 [jira] Add parallel ORDER BY to Hive".

        Rebased to trunk & changed to mini-mr test

        Reviewers: JIRA

        REVISION DETAIL
        https://reviews.facebook.net/D8895

        CHANGE SINCE LAST DIFF
        https://reviews.facebook.net/D8895?vs=28641&id=33999#toc

        AFFECTED FILES
        build-common.xml
        common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalContext.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SamplingOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
        ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java
        ql/src/test/queries/clientpositive/parallel_orderby.q
        ql/src/test/results/clientpositive/parallel_orderby.q.out

        To: JIRA, navis

        Show
        Phabricator added a comment - navis updated the revision " HIVE-1402 [jira] Add parallel ORDER BY to Hive". Rebased to trunk & changed to mini-mr test Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D8895 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8895?vs=28641&id=33999#toc AFFECTED FILES build-common.xml common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalContext.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SamplingOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinResolver.java ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java ql/src/test/queries/clientpositive/parallel_orderby.q ql/src/test/results/clientpositive/parallel_orderby.q.out To: JIRA, navis
        Hide
        Phabricator added a comment -

        brock has commented on the revision "HIVE-1402 [jira] Add parallel ORDER BY to Hive".

        Hi

        It doesn't look like this patch applies to trunk. For example, HiveTotalOrderPartitioner should be a totally new file, correct? However, in the diff it's not:

        ===================================================================
        — ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java
        +++ ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java
        @@ -1,4 +1,6 @@
        /**
        + * Copyright 2010 The Apache Software Foundation
        + *

        • Licensed to the Apache Software Foundation (ASF) under one
        • or more contributor license agreements. See the NOTICE file
        • distributed with this work for additional information
          @@ -18,34 +20,24 @@

        package org.apache.hadoop.hive.ql.exec;

        -import java.util.Collection;
        -import java.util.HashSet;
        -import java.util.Set;
        +import org.apache.hadoop.hive.ql.io.HiveKey;
        +import org.apache.hadoop.io.BytesWritable;
        +import org.apache.hadoop.mapred.JobConf;
        +import org.apache.hadoop.mapred.Partitioner;
        +import org.apache.hadoop.mapred.lib.TotalOrderPartitioner;

        -public class OperatorUtils {
        +public class HiveTotalOrderPartitioner implements Partitioner<HiveKey, Object> {

        REVISION DETAIL
        https://reviews.facebook.net/D8895

        To: JIRA, navis
        Cc: brock

        Show
        Phabricator added a comment - brock has commented on the revision " HIVE-1402 [jira] Add parallel ORDER BY to Hive". Hi It doesn't look like this patch applies to trunk. For example, HiveTotalOrderPartitioner should be a totally new file, correct? However, in the diff it's not: =================================================================== — ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java @@ -1,4 +1,6 @@ /** + * Copyright 2010 The Apache Software Foundation + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information @@ -18,34 +20,24 @@ package org.apache.hadoop.hive.ql.exec; -import java.util.Collection; -import java.util.HashSet; -import java.util.Set; +import org.apache.hadoop.hive.ql.io.HiveKey; +import org.apache.hadoop.io.BytesWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Partitioner; +import org.apache.hadoop.mapred.lib.TotalOrderPartitioner; -public class OperatorUtils { +public class HiveTotalOrderPartitioner implements Partitioner<HiveKey, Object> { REVISION DETAIL https://reviews.facebook.net/D8895 To: JIRA, navis Cc: brock
        Hide
        Navis added a comment -

        Arcanist makes strange patch file sometimes and I don't know how to fix that. Attaching patch file.

        Show
        Navis added a comment - Arcanist makes strange patch file sometimes and I don't know how to fix that. Attaching patch file.
        Hide
        Phabricator added a comment -

        navis updated the revision "HIVE-1402 [jira] Add parallel ORDER BY to Hive".

        Rebased to trunk

        Reviewers: JIRA

        REVISION DETAIL
        https://reviews.facebook.net/D8895

        CHANGE SINCE LAST DIFF
        https://reviews.facebook.net/D8895?vs=33999&id=34485#toc

        AFFECTED FILES
        build-common.xml
        common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalContext.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SamplingOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
        ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java
        ql/src/test/queries/clientpositive/parallel_orderby.q
        ql/src/test/results/clientpositive/parallel_orderby.q.out

        To: JIRA, navis
        Cc: brock

        Show
        Phabricator added a comment - navis updated the revision " HIVE-1402 [jira] Add parallel ORDER BY to Hive". Rebased to trunk Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D8895 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8895?vs=33999&id=34485#toc AFFECTED FILES build-common.xml common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalContext.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SamplingOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinResolver.java ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java ql/src/test/queries/clientpositive/parallel_orderby.q ql/src/test/results/clientpositive/parallel_orderby.q.out To: JIRA, navis Cc: brock
        Hide
        Phabricator added a comment -

        navis updated the revision "HIVE-1402 [jira] Add parallel ORDER BY to Hive".

        Support UDTFs

        Reviewers: JIRA

        REVISION DETAIL
        https://reviews.facebook.net/D8895

        CHANGE SINCE LAST DIFF
        https://reviews.facebook.net/D8895?vs=34485&id=34965#toc

        AFFECTED FILES
        build-common.xml
        common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalContext.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SamplingOptimizer.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinResolver.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
        ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java
        ql/src/test/queries/clientpositive/parallel_orderby.q
        ql/src/test/results/clientpositive/parallel_orderby.q.out

        To: JIRA, navis
        Cc: brock

        Show
        Phabricator added a comment - navis updated the revision " HIVE-1402 [jira] Add parallel ORDER BY to Hive". Support UDTFs Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D8895 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8895?vs=34485&id=34965#toc AFFECTED FILES build-common.xml common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalContext.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SamplingOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinResolver.java ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java ql/src/test/queries/clientpositive/parallel_orderby.q ql/src/test/results/clientpositive/parallel_orderby.q.out To: JIRA, navis Cc: brock
        Hide
        Edward Capriolo added a comment -

        Im going to start reviewing this one and run these tests tonight hopefully.

        Show
        Edward Capriolo added a comment - Im going to start reviewing this one and run these tests tonight hopefully.
        Hide
        Edward Capriolo added a comment -

        Tests running. The patch .4 is still not clean. TotalOrderPartitioner is not right. I copied it by hand from fabricator for now. I like the idea to leave the feature off by default for now. But typically we enable this to true for full test coverage. Maybe we dothis as a follow on issue.

        Show
        Edward Capriolo added a comment - Tests running. The patch .4 is still not clean. TotalOrderPartitioner is not right. I copied it by hand from fabricator for now. I like the idea to leave the feature off by default for now. But typically we enable this to true for full test coverage. Maybe we dothis as a follow on issue.
        Hide
        Edward Capriolo added a comment -

        Navis. Nice work all tests pass. One thing I noticed.

        for (FileStatus status : fs.globStatus(new Path(inputPath, ".sampling*")))

        Unknown macro: { + sampler.addSampleFile(status.getPath(), job); + }

        For hadoop hidden files by convention start with '_' not '.' . Does it make sense to say "_sampling*" here. Possibly put this as a constant at the top of the file.

        So please fix the above thing if you think it is a good idea, and regenerate the patch so it applies clean please.

        Then I will commit.

        Show
        Edward Capriolo added a comment - Navis. Nice work all tests pass. One thing I noticed. for (FileStatus status : fs.globStatus(new Path(inputPath, ".sampling*"))) Unknown macro: { + sampler.addSampleFile(status.getPath(), job); + } For hadoop hidden files by convention start with '_' not '.' . Does it make sense to say "_sampling*" here. Possibly put this as a constant at the top of the file. So please fix the above thing if you think it is a good idea, and regenerate the patch so it applies clean please. Then I will commit.
        Hide
        Navis added a comment -

        I've seen code in FileInputFormat in hadoop which ignores files starting with '_' and '.'

        private static final PathFilter hiddenFileFilter = new PathFilter(){
          public boolean accept(Path p){
            String name = p.getName(); 
            return !name.startsWith("_") && !name.startsWith("."); 
          }
        }; 
        

        If it's convention as you said, I don't bother changing it to '_' (I'll update it tomorrow morning). Thanks for review!

        Show
        Navis added a comment - I've seen code in FileInputFormat in hadoop which ignores files starting with '_' and '.' private static final PathFilter hiddenFileFilter = new PathFilter(){ public boolean accept(Path p){ String name = p.getName(); return !name.startsWith("_") && !name.startsWith("."); } }; If it's convention as you said, I don't bother changing it to '_' (I'll update it tomorrow morning). Thanks for review!
        Hide
        Edward Capriolo added a comment -

        You do not have to change '.' to '_'. I was under the impression that it was only _ but apparently it is both.

        Just regenerate the patch so it applies clean. If you do not have time I will do it later this afternoon.

        Show
        Edward Capriolo added a comment - You do not have to change '.' to '_'. I was under the impression that it was only _ but apparently it is both. Just regenerate the patch so it applies clean. If you do not have time I will do it later this afternoon.
        Hide
        Edward Capriolo added a comment -

        Committed thanks Navis and Jeff

        Show
        Edward Capriolo added a comment - Committed thanks Navis and Jeff
        Hide
        Edward Capriolo added a comment -

        @Navis I had to merge whatever fabricator did to the patch. I committed what I tested. If you want to modify anything we should open up a follow on.

        Show
        Edward Capriolo added a comment - @Navis I had to merge whatever fabricator did to the patch. I committed what I tested. If you want to modify anything we should open up a follow on.
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #255 (See https://builds.apache.org/job/Hive-trunk-hadoop2/255/)
        HIVE-1402 Add parallel order by to hive (Navis Ryu and Jeff Zhang via egc)

        Submitted by: Navis Ryu
        Reviewed by: Edward Capriolo (Revision 1495847)

        Result = ABORTED
        ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1495847
        Files :

        • /hive/trunk/build-common.xml
        • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalContext.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SamplingOptimizer.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinResolver.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java
        • /hive/trunk/ql/src/test/queries/clientpositive/parallel_orderby.q
        • /hive/trunk/ql/src/test/results/clientpositive/parallel_orderby.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #255 (See https://builds.apache.org/job/Hive-trunk-hadoop2/255/ ) HIVE-1402 Add parallel order by to hive (Navis Ryu and Jeff Zhang via egc) Submitted by: Navis Ryu Reviewed by: Edward Capriolo (Revision 1495847) Result = ABORTED ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1495847 Files : /hive/trunk/build-common.xml /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalContext.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SamplingOptimizer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinResolver.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java /hive/trunk/ql/src/test/queries/clientpositive/parallel_orderby.q /hive/trunk/ql/src/test/results/clientpositive/parallel_orderby.q.out
        Hide
        Ashutosh Chauhan added a comment -

        Newly added test parallel_orderby.q is consistently failing on trunk. https://builds.apache.org/job/Hive-trunk-h0.21/2166/ https://builds.apache.org/job/Hive-trunk-h0.21/2168/ etc. Edward Capriolo Would you like to take a look?

        Show
        Ashutosh Chauhan added a comment - Newly added test parallel_orderby.q is consistently failing on trunk. https://builds.apache.org/job/Hive-trunk-h0.21/2166/ https://builds.apache.org/job/Hive-trunk-h0.21/2168/ etc. Edward Capriolo Would you like to take a look?
        Hide
        Edward Capriolo added a comment -

        Navis I am guessing that we might need something in the shim layer, because total order partitioner does not work consistently across hadoop versions?

        Show
        Edward Capriolo added a comment - Navis I am guessing that we might need something in the shim layer, because total order partitioner does not work consistently across hadoop versions?
        Hide
        Navis added a comment -

        Yes, strange. I'll look into this.

        Show
        Navis added a comment - Yes, strange. I'll look into this.
        Hide
        Ashutosh Chauhan added a comment -

        This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

        Show
        Ashutosh Chauhan added a comment - This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.
        Hide
        Lefty Leverenz added a comment -

        Doc note: This added three configuration parameters to HiveConf.java and HIVE-7669 gives them descriptions: hive.optimize.sampling.orderby, hive.optimize.sampling.orderby.number, and hive.optimize.sampling.orderby.percent.

        They need to be documented in the wiki here:

        Show
        Lefty Leverenz added a comment - Doc note: This added three configuration parameters to HiveConf.java and HIVE-7669 gives them descriptions: hive.optimize.sampling.orderby , hive.optimize.sampling.orderby.number , and hive.optimize.sampling.orderby.percent . They need to be documented in the wiki here: Configuration Properties – Query and DDL Execution

          People

          • Assignee:
            Navis
            Reporter:
            Jeff Hammerbacher
          • Votes:
            2 Vote for this issue
            Watchers:
            24 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development