Pig
  1. Pig
  2. PIG-2689

JsonStorage fails to find schema when LimitAdjuster runs

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Scripts that both save out data with JsonStorage and trigger the LimitAdjuster (e.g. doing an order by followed by a limit) yield the following Exception:

      java.io.IOException: Could not find schema in UDF context
      at org.apache.pig.builtin.JsonStorage.prepareToWrite(JsonStorage.java:125)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:125)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:86)
      at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:569)
      at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:638)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)

      This happens b/c the LimitAdjuster does not copy the signature into it's newly created POStore, and hence JsonStorage looks for the schema for a null signature.

      1. PIG-2689.patch
        6 kB
        Doug Daniels

        Activity

        Hide
        Doug Daniels added a comment -

        Attached a patch that fixes this by copying the signature to the new POStore. It also copies the alias, which helps in illustrate.

        Show
        Doug Daniels added a comment - Attached a patch that fixes this by copying the signature to the new POStore. It also copies the alias, which helps in illustrate.
        Hide
        Gianmarco De Francisci Morales added a comment -

        The modification looks OK, but I am not sure about the tests.
        Should we test this as an e2e test?

        Show
        Gianmarco De Francisci Morales added a comment - The modification looks OK, but I am not sure about the tests. Should we test this as an e2e test?
        Hide
        Alan Gates added a comment -

        This patch no longer applies because PhysicalOperator no longer has a setAlias method. It's not clear to me why that was removed. It also wasn't clear to me whether it was required for this patch (it looked like the setSignature was the one that mattered, but I wanted to confirm that before proceeding).

        As for the e2e tests for this, ideally I agree we should have one. But we don't generate any json data in the tests yet, so it seems too much to ask to add a new data set and tests for it.

        I'm going to set this JIRA to open since the patch as is doesn't apply. But if you feel setAlias isn't required I'm fine to apply the patch.

        Show
        Alan Gates added a comment - This patch no longer applies because PhysicalOperator no longer has a setAlias method. It's not clear to me why that was removed. It also wasn't clear to me whether it was required for this patch (it looked like the setSignature was the one that mattered, but I wanted to confirm that before proceeding). As for the e2e tests for this, ideally I agree we should have one. But we don't generate any json data in the tests yet, so it seems too much to ask to add a new data set and tests for it. I'm going to set this JIRA to open since the patch as is doesn't apply. But if you feel setAlias isn't required I'm fine to apply the patch.

          People

          • Assignee:
            Unassigned
            Reporter:
            Doug Daniels
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development