Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5231

PigStorage with -schema may produce inconsistent outputs with more fields

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.17.0
    • None
    • None
    • Reviewed

    Description

      When multiple directories are passed to PigStorage(',','-schema'), pig will

      No attempt to merge conflicting schemas is made during loading. The first schema encountered during a file system scan is used.

      For two directories input with schema
      file1: (f1:chararray, f2:int) and
      file2: (f1:chararray, f2:int, f3:int)

      Pig will pick the first schema from file1 and only allow f1, f2 access.
      However, output would still contain 3 fields for tuples from file2. This later leads to complete corrupt outputs due to shifted fields resulting in incorrect references.
      (This may also happen when input itself contains the delimiter.)

      If file2 schema is picked, this is already handled by filling the missing fields with null. (PIG-3100)

      Attachments

        1. pig-5231-v01.patch
          3 kB
          Koji Noguchi

        Activity

          People

            knoguchi Koji Noguchi
            knoguchi Koji Noguchi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: