Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1006

getPartitionDescFromPath failing from CombineHiveInputFormat

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.4.1
    • 0.5.0
    • Query Processor
    • None
    • Reviewed

    Description

      When HiveInputFormat.getPartitionDescFromPath is called from CombineHiveInputFormat, it sometimes fails to return a matching partitionDesc which then causes an Exception down the line since the split doesn't have an inputFormatClassName.

      The issue is that the path format used as the key in pathToPartitionInfo varies between stage - in the first stage it's the complete path as returned from the table definitions (eg. hdfs://server/path), and then in subsequent stages, it's the complete path with port (eg. hdfs://server:8020/path) of the result of the previous stage. This isn't a problem in HiveInputFormat since the directory you're looking up always uses the same format as the keys, but in CombineHiveInputFormat, we take that path and look up its children in the file system to get all the block information, and then use one of the returned paths to get the partition info – and that returned path does not include the port. So, in any stage after the first, we are looking for a path without the port, but all the keys in the map contain a port, so we don't find a match.

      The attached patch may not be ideal – it doesn't fix the underlying problem of inconsistent path formats in pathToPartitionInfo – it just works around it by walking through the map and looking for a matching path rather than doing a hash lookup.

      Attachments

        1. hive.1006.2.patch
          1 kB
          Dave Lerman
        2. hive.1006.1.patch
          1 kB
          Dave Lerman

        Issue Links

          Activity

            People

              dlerman Dave Lerman
              dlerman Dave Lerman
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: