Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2909

Add a new option for ignoring corrupted files to AvroStorage load func

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11
    • Component/s: piggybank
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Currently, AvroStorage load fails with AvroRuntimeException when encountering corrupted input files. For example,

      ERROR 2997: Unable to recreate exception from backed error: java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
      	at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:283)
      

      But it is not always desirable to fail the Pig job for bad files. It is sometimes more useful to skip them and continue.

        Attachments

        1. PIG-2909-avro_test_files.tar.gz
          0.4 kB
          Cheolsoo Park
        2. PIG-2909.patch
          10 kB
          Cheolsoo Park
        3. PIG-2909-2.patch
          10 kB
          Cheolsoo Park

          Issue Links

            Activity

              People

              • Assignee:
                cheolsoo Cheolsoo Park
                Reporter:
                cheolsoo Cheolsoo Park
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: