Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2909

Add a new option for ignoring corrupted files to AvroStorage load func

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.10.0
    • 0.11
    • piggybank
    • None
    • Patch Available

    Description

      Currently, AvroStorage load fails with AvroRuntimeException when encountering corrupted input files. For example,

      ERROR 2997: Unable to recreate exception from backed error: java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
      	at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:283)
      

      But it is not always desirable to fail the Pig job for bad files. It is sometimes more useful to skip them and continue.

      Attachments

        1. PIG-2909.patch
          10 kB
          Cheolsoo Park
        2. PIG-2909-2.patch
          10 kB
          Cheolsoo Park
        3. PIG-2909-avro_test_files.tar.gz
          0.4 kB
          Cheolsoo Park

        Issue Links

          Activity

            People

              cheolsoo Cheolsoo Park
              cheolsoo Cheolsoo Park
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: