Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6.0
    • Component/s: Query Processor
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      If the table being inserted is a bucketed, currently hive does not try to enforce that.
      An option should be added for checking that.

      Moreover, the number of buckets can be higher than the number of maximum reducers, in which
      case a single reducer can write to multiple files.

      1. hive.1178.3.patch
        1.10 MB
        Namit Jain
      2. hive.1178.2.patch
        1.10 MB
        Namit Jain
      3. hive.1178.1.patch
        1.10 MB
        Namit Jain

        Activity

        Hide
        Namit Jain added a comment -

        Incorporated Yongqiang's comments

        Show
        Namit Jain added a comment - Incorporated Yongqiang's comments
        Hide
        Zheng Shao added a comment -

        Can you explain what are the comments and changes?

        Show
        Zheng Shao added a comment - Can you explain what are the comments and changes?
        Hide
        He Yongqiang added a comment -

        Sorry. Reviewed it with Namit this morning offline.
        My comments for previous patch are:

        int totalFiles = 1;
        int numFiles = 1;

        if (numBuckets > maxReducers) {
        ...
        if (totalFiles % maxReducers == 0)

        { ... }

        else

        { numFiles = (totalFiles/maxReducers)+1 maxReducers = totalFiles/numFiles; }

        If numBuckets > maxReducer and is not a multiply, the code will try to find how many files need to be written for each reducer. And use that fileNumber to get a reducer number. Do we need to guarantee that the calculated reducer number multiply number of files in each reducer should be the bucket number? If so, it seems the above code can not guarantee that. For example (bucket number is 30, max reducer is 9), then numFiles will be 4, and maxReducer will be 7. And 4*7=28 != 30

        The new patch uses a loop to find the good reducer number.

        Show
        He Yongqiang added a comment - Sorry. Reviewed it with Namit this morning offline. My comments for previous patch are: int totalFiles = 1; int numFiles = 1; if (numBuckets > maxReducers) { ... if (totalFiles % maxReducers == 0) { ... } else { numFiles = (totalFiles/maxReducers)+1 maxReducers = totalFiles/numFiles; } If numBuckets > maxReducer and is not a multiply, the code will try to find how many files need to be written for each reducer. And use that fileNumber to get a reducer number. Do we need to guarantee that the calculated reducer number multiply number of files in each reducer should be the bucket number? If so, it seems the above code can not guarantee that. For example (bucket number is 30, max reducer is 9), then numFiles will be 4, and maxReducer will be 7. And 4*7=28 != 30 The new patch uses a loop to find the good reducer number.
        Hide
        He Yongqiang added a comment -

        bucket1, bucket2, bucket3, and input2 tests failed. Can you take a look?

        Show
        He Yongqiang added a comment - bucket1, bucket2, bucket3, and input2 tests failed. Can you take a look?
        Hide
        Namit Jain added a comment -

        There was a problem with running unit tests with hadoop 17 - fixed that.
        Thanks Yongqiang

        Show
        Namit Jain added a comment - There was a problem with running unit tests with hadoop 17 - fixed that. Thanks Yongqiang
        Hide
        He Yongqiang added a comment -

        Committed. Thanks Namit!

        Show
        He Yongqiang added a comment - Committed. Thanks Namit!

          People

          • Assignee:
            Namit Jain
            Reporter:
            Namit Jain
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development