Pig
  1. Pig
  2. PIG-2689

JsonStorage fails to find schema when LimitAdjuster runs

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None

      Description

      Scripts that both save out data with JsonStorage and trigger the LimitAdjuster (e.g. doing an order by followed by a limit) yield the following Exception:

      java.io.IOException: Could not find schema in UDF context
      at org.apache.pig.builtin.JsonStorage.prepareToWrite(JsonStorage.java:125)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:125)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:86)
      at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:569)
      at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:638)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)

      This happens b/c the LimitAdjuster does not copy the signature into it's newly created POStore, and hence JsonStorage looks for the schema for a null signature.

      1. PIG-2689-2-nowhitespacechange.patch
        8 kB
        Rohini Palaniswamy
      2. PIG-2689-2.patch
        14 kB
        Rohini Palaniswamy
      3. PIG-2689.patch
        6 kB
        Doug Daniels

        Issue Links

          Activity

          Hide
          Doug Daniels added a comment -

          Attached a patch that fixes this by copying the signature to the new POStore. It also copies the alias, which helps in illustrate.

          Show
          Doug Daniels added a comment - Attached a patch that fixes this by copying the signature to the new POStore. It also copies the alias, which helps in illustrate.
          Hide
          Gianmarco De Francisci Morales added a comment -

          The modification looks OK, but I am not sure about the tests.
          Should we test this as an e2e test?

          Show
          Gianmarco De Francisci Morales added a comment - The modification looks OK, but I am not sure about the tests. Should we test this as an e2e test?
          Hide
          Alan Gates added a comment -

          This patch no longer applies because PhysicalOperator no longer has a setAlias method. It's not clear to me why that was removed. It also wasn't clear to me whether it was required for this patch (it looked like the setSignature was the one that mattered, but I wanted to confirm that before proceeding).

          As for the e2e tests for this, ideally I agree we should have one. But we don't generate any json data in the tests yet, so it seems too much to ask to add a new data set and tests for it.

          I'm going to set this JIRA to open since the patch as is doesn't apply. But if you feel setAlias isn't required I'm fine to apply the patch.

          Show
          Alan Gates added a comment - This patch no longer applies because PhysicalOperator no longer has a setAlias method. It's not clear to me why that was removed. It also wasn't clear to me whether it was required for this patch (it looked like the setSignature was the one that mattered, but I wanted to confirm that before proceeding). As for the e2e tests for this, ideally I agree we should have one. But we don't generate any json data in the tests yet, so it seems too much to ask to add a new data set and tests for it. I'm going to set this JIRA to open since the patch as is doesn't apply. But if you feel setAlias isn't required I'm fine to apply the patch.
          Hide
          Rohini Palaniswamy added a comment - - edited

          The issue is already fixed by PIG-3120. But that does not have a unit test. So added a unit test as part of this patch and also set the alias on the new limit operator based on Doug Daniels original patch. The unit test on the original patch does not apply cleanly anymore as having files for test is removed from TestJsonStorage and they are generated at run time.

          Show
          Rohini Palaniswamy added a comment - - edited The issue is already fixed by PIG-3120 . But that does not have a unit test. So added a unit test as part of this patch and also set the alias on the new limit operator based on Doug Daniels original patch. The unit test on the original patch does not apply cleanly anymore as having files for test is removed from TestJsonStorage and they are generated at run time.

            People

            • Assignee:
              Doug Daniels
              Reporter:
              Doug Daniels
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development