Hive
  1. Hive
  2. HIVE-1242

CombineHiveInputFormat does not work for compressed text files

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.6.0
    • Component/s: Query Processor
    • Labels:
      None
    1. hive.1242.branch5.patch
      14 kB
      Namit Jain
    2. hive.1242.branch5.3.patch
      14 kB
      Namit Jain
    3. hive.1242.branch5.2.patch
      14 kB
      Namit Jain
    4. hive.1242.4.patch
      13 kB
      Namit Jain
    5. hive.1242.3.patch
      13 kB
      Namit Jain
    6. hive.1242.2.patch
      12 kB
      Namit Jain
    7. hive.1242.1.patch
      15 kB
      Namit Jain

      Issue Links

        Activity

        Namit Jain created issue -
        Hide
        Namit Jain added a comment -

        Attached a patch - running unit tests right now.

        This removes the change for http://issues.apache.org/jira/browse/HIVE-1200,
        since this is needed as a hot fix.

        Show
        Namit Jain added a comment - Attached a patch - running unit tests right now. This removes the change for http://issues.apache.org/jira/browse/HIVE-1200 , since this is needed as a hot fix.
        Namit Jain made changes -
        Field Original Value New Value
        Attachment hive.1242.1.patch [ 12438588 ]
        Hide
        He Yongqiang added a comment -

        can we switch

        part = getPartitionDescFromPath(pathToPartitionInfo, ipaths[i]
        .getParent());

        with
        part = getPartitionDescFromPath(pathToPartitionInfo, ipaths[i]);

        I think this will also work for hive-1200

        Show
        He Yongqiang added a comment - can we switch part = getPartitionDescFromPath(pathToPartitionInfo, ipaths [i] .getParent()); with part = getPartitionDescFromPath(pathToPartitionInfo, ipaths [i] ); I think this will also work for hive-1200
        Hide
        Zheng Shao added a comment -

        Talked with Namit offline. HIVE-1200 needs a small fix that will be included together by Namit.

        Show
        Zheng Shao added a comment - Talked with Namit offline. HIVE-1200 needs a small fix that will be included together by Namit.
        Namit Jain made changes -
        Attachment hive.1242.2.patch [ 12438647 ]
        Namit Jain made changes -
        Attachment hive.1242.branch5.patch [ 12438648 ]
        Namit Jain made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Namit Jain made changes -
        Attachment hive.1242.3.patch [ 12438650 ]
        Namit Jain made changes -
        Attachment hive.1242.branch5.2.patch [ 12438651 ]
        Hide
        Namit Jain added a comment -

        Added more comments based on a offline discussion with Ning

        Show
        Namit Jain added a comment - Added more comments based on a offline discussion with Ning
        Hide
        Namit Jain added a comment -

        A follow-up jira has been filed.

        https://issues.apache.org/jira/browse/MAPREDUCE-1597

        Once that is fixed, the current patch is not needed

        Show
        Namit Jain added a comment - A follow-up jira has been filed. https://issues.apache.org/jira/browse/MAPREDUCE-1597 Once that is fixed, the current patch is not needed
        Hide
        Ning Zhang added a comment -

        +1 will commit if tests pass.

        Show
        Ning Zhang added a comment - +1 will commit if tests pass.
        Hide
        He Yongqiang added a comment -

        Is there any change in hive.1242.3.patch compared to hive.1242.2.patch?

        Show
        He Yongqiang added a comment - Is there any change in hive.1242.3.patch compared to hive.1242.2.patch?
        Hide
        Namit Jain added a comment -

        added some comments

        Show
        Namit Jain added a comment - added some comments
        Hide
        Ning Zhang added a comment -

        Namit, branch 0.5 failed on combine1.q and input2.q. Can you take a look? i'll run test trunk also.

        Show
        Ning Zhang added a comment - Namit, branch 0.5 failed on combine1.q and input2.q. Can you take a look? i'll run test trunk also.
        Hide
        Namit Jain added a comment -

        It is working for me - can you mail/upload the diff/log

        Show
        Namit Jain added a comment - It is working for me - can you mail/upload the diff/log
        Namit Jain made changes -
        Attachment hive.1242.4.patch [ 12438761 ]
        Namit Jain made changes -
        Attachment hive.1242.branch5.3.patch [ 12438762 ]
        Hide
        Namit Jain added a comment -

        Ning, found the problem - it was dumping when being used with a older hadoop version.
        please take a look at the new patches (both for trunk and 0.5)

        Show
        Namit Jain added a comment - Ning, found the problem - it was dumping when being used with a older hadoop version. please take a look at the new patches (both for trunk and 0.5)
        Hide
        Ning Zhang added a comment -

        Committed to 0.5 and trunk. Thanks Namit!

        Show
        Ning Zhang added a comment - Committed to 0.5 and trunk. Thanks Namit!
        Ning Zhang made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Zheng Shao made changes -
        Link This issue is related to HIVE-1289 [ HIVE-1289 ]
        Carl Steinbach made changes -
        Fix Version/s 0.5.1 [ 12314793 ]
        Carl Steinbach made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        karen added a comment -

        The comment in the code include the followings:
        // Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not,
        // we use a configuration variable for the same

        This is clearly misunderstanding. MAPREDUCE-1597 is a bug that was fixed for new api (mapreduce package), while CombineHiveInputFormat is using old API (deprecated mapred package). CombineHiveInputFormat does not properly works with any non-splittable compressed file, if that files is written over multiple hdfs partitions (same as CombineFileInputFormat from mapred package, while CombineFileInputFormat from new mapreduce package works beautifully).
        Not sure why this issue is marked as closed.

        Show
        karen added a comment - The comment in the code include the followings: // Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not, // we use a configuration variable for the same This is clearly misunderstanding. MAPREDUCE-1597 is a bug that was fixed for new api (mapreduce package), while CombineHiveInputFormat is using old API (deprecated mapred package). CombineHiveInputFormat does not properly works with any non-splittable compressed file, if that files is written over multiple hdfs partitions (same as CombineFileInputFormat from mapred package, while CombineFileInputFormat from new mapreduce package works beautifully). Not sure why this issue is marked as closed.
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        22h 52m 1 Namit Jain 12/Mar/10 22:16
        Patch Available Patch Available Resolved Resolved
        2d 1h 53m 1 Ning Zhang 15/Mar/10 00:09
        Resolved Resolved Closed Closed
        641d 23h 53m 1 Carl Steinbach 17/Dec/11 00:03

          People

          • Assignee:
            Namit Jain
            Reporter:
            Namit Jain
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development