Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4175

Injection of emptyFile into input splits for empty partitions causes Deserializer to fail

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.10.0
    • None
    • None
    • None
    • CDH4.2, using MR1

    Description

      • My deserializer is expecting to receive one of 2 different subclasses of Writable, but in certain circumstances it receives an empty instance of org.apache.hadoop.io.Text. This only happens for task attempts where I observe the file called "emptyFile" in the list of input splits.

      I'm doing queries over an external year/month/day partitioned table that have eagerly created partitions for, so as of today for example, I may do a query where year = 2013 and month = 3 which includes empty partitions.

      In the course of investigation I downloaded the sequence files to confirm they were ok. Once I realized that processing of empty partitions was to blame, I am able to work around the issue by bounding my queries to populated partitions.

      Can the need for the emptyFile be eliminated in the case where there's already a bunch of splits being processed? Failing that, can the mapper detect the current input is from emptyFile and not call the deserializer.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jkebinger James Kebinger
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: