Hive
  1. Hive
  2. HIVE-1242

CombineHiveInputFormat does not work for compressed text files

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.6.0
    • Component/s: Query Processor
    • Labels:
      None
    1. hive.1242.1.patch
      15 kB
      Namit Jain
    2. hive.1242.2.patch
      12 kB
      Namit Jain
    3. hive.1242.3.patch
      13 kB
      Namit Jain
    4. hive.1242.4.patch
      13 kB
      Namit Jain
    5. hive.1242.branch5.2.patch
      14 kB
      Namit Jain
    6. hive.1242.branch5.3.patch
      14 kB
      Namit Jain
    7. hive.1242.branch5.patch
      14 kB
      Namit Jain

      Issue Links

        Activity

        Hide
        Namit Jain added a comment -

        Attached a patch - running unit tests right now.

        This removes the change for http://issues.apache.org/jira/browse/HIVE-1200,
        since this is needed as a hot fix.

        Show
        Namit Jain added a comment - Attached a patch - running unit tests right now. This removes the change for http://issues.apache.org/jira/browse/HIVE-1200 , since this is needed as a hot fix.
        Hide
        He Yongqiang added a comment -

        can we switch

        part = getPartitionDescFromPath(pathToPartitionInfo, ipaths[i]
        .getParent());

        with
        part = getPartitionDescFromPath(pathToPartitionInfo, ipaths[i]);

        I think this will also work for hive-1200

        Show
        He Yongqiang added a comment - can we switch part = getPartitionDescFromPath(pathToPartitionInfo, ipaths [i] .getParent()); with part = getPartitionDescFromPath(pathToPartitionInfo, ipaths [i] ); I think this will also work for hive-1200
        Hide
        Zheng Shao added a comment -

        Talked with Namit offline. HIVE-1200 needs a small fix that will be included together by Namit.

        Show
        Zheng Shao added a comment - Talked with Namit offline. HIVE-1200 needs a small fix that will be included together by Namit.
        Hide
        Namit Jain added a comment -

        Added more comments based on a offline discussion with Ning

        Show
        Namit Jain added a comment - Added more comments based on a offline discussion with Ning
        Hide
        Namit Jain added a comment -

        A follow-up jira has been filed.

        https://issues.apache.org/jira/browse/MAPREDUCE-1597

        Once that is fixed, the current patch is not needed

        Show
        Namit Jain added a comment - A follow-up jira has been filed. https://issues.apache.org/jira/browse/MAPREDUCE-1597 Once that is fixed, the current patch is not needed
        Hide
        Ning Zhang added a comment -

        +1 will commit if tests pass.

        Show
        Ning Zhang added a comment - +1 will commit if tests pass.
        Hide
        He Yongqiang added a comment -

        Is there any change in hive.1242.3.patch compared to hive.1242.2.patch?

        Show
        He Yongqiang added a comment - Is there any change in hive.1242.3.patch compared to hive.1242.2.patch?
        Hide
        Namit Jain added a comment -

        added some comments

        Show
        Namit Jain added a comment - added some comments
        Hide
        Ning Zhang added a comment -

        Namit, branch 0.5 failed on combine1.q and input2.q. Can you take a look? i'll run test trunk also.

        Show
        Ning Zhang added a comment - Namit, branch 0.5 failed on combine1.q and input2.q. Can you take a look? i'll run test trunk also.
        Hide
        Namit Jain added a comment -

        It is working for me - can you mail/upload the diff/log

        Show
        Namit Jain added a comment - It is working for me - can you mail/upload the diff/log
        Hide
        Namit Jain added a comment -

        Ning, found the problem - it was dumping when being used with a older hadoop version.
        please take a look at the new patches (both for trunk and 0.5)

        Show
        Namit Jain added a comment - Ning, found the problem - it was dumping when being used with a older hadoop version. please take a look at the new patches (both for trunk and 0.5)
        Hide
        Ning Zhang added a comment -

        Committed to 0.5 and trunk. Thanks Namit!

        Show
        Ning Zhang added a comment - Committed to 0.5 and trunk. Thanks Namit!
        Hide
        karen added a comment -

        The comment in the code include the followings:
        // Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not,
        // we use a configuration variable for the same

        This is clearly misunderstanding. MAPREDUCE-1597 is a bug that was fixed for new api (mapreduce package), while CombineHiveInputFormat is using old API (deprecated mapred package). CombineHiveInputFormat does not properly works with any non-splittable compressed file, if that files is written over multiple hdfs partitions (same as CombineFileInputFormat from mapred package, while CombineFileInputFormat from new mapreduce package works beautifully).
        Not sure why this issue is marked as closed.

        Show
        karen added a comment - The comment in the code include the followings: // Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not, // we use a configuration variable for the same This is clearly misunderstanding. MAPREDUCE-1597 is a bug that was fixed for new api (mapreduce package), while CombineHiveInputFormat is using old API (deprecated mapred package). CombineHiveInputFormat does not properly works with any non-splittable compressed file, if that files is written over multiple hdfs partitions (same as CombineFileInputFormat from mapred package, while CombineFileInputFormat from new mapreduce package works beautifully). Not sure why this issue is marked as closed.

          People

          • Assignee:
            Namit Jain
            Reporter:
            Namit Jain
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development