Pig
  1. Pig
  2. PIG-2909

Add a new option for ignoring corrupted files to AvroStorage load func

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11
    • Component/s: piggybank
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Currently, AvroStorage load fails with AvroRuntimeException when encountering corrupted input files. For example,

      ERROR 2997: Unable to recreate exception from backed error: java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
      	at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:283)
      

      But it is not always desirable to fail the Pig job for bad files. It is sometimes more useful to skip them and continue.

      1. PIG-2909-2.patch
        10 kB
        Cheolsoo Park
      2. PIG-2909.patch
        10 kB
        Cheolsoo Park
      3. PIG-2909-avro_test_files.tar.gz
        0.4 kB
        Cheolsoo Park

        Issue Links

          Activity

            People

            • Assignee:
              Cheolsoo Park
              Reporter:
              Cheolsoo Park
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development