Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Pig + Hadoop 17

      Description

      A = load '...' USING PigStorage('\t') AS (c1, c2, c3, n1);
      B = group A by (c1,c2,c3);
      C = foreach B generate flatten(group), SUM(A.n1);
      store C into ...;

      Runs with combiner and errors out.

      java.io.IOException: For input string: "..." additional info: iteration = 1bag size = 2 partial sum = 0.0
      previous tupple = (...)
      at org.apache.pig.builtin.SUM.sum(SUM.java:95)
      at org.apache.pig.builtin.SUM$Final.exec(SUM.java:63)
      at org.apache.pig.builtin.SUM$Final.exec(SUM.java:60)
      at org.apache.pig.impl.eval.FuncEvalSpec$1.add(FuncEvalSpec.java:116)
      at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.<init>(GenerateSpec.java:159)
      at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:79)
      at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.reduce(PigMapReduce.java:165)
      at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.reduce(PigMapReduce.java:80)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

      Work-around was that I put out combiner:

      C = foreach B generate SUM(A.n1),flatten(group);

      and it worked. Input data has some private information in it so I cannot post it. Let me know if it was not possible to solve it without having it. Then we compile a similar input.

      c1,c2,c3 are alphabetic,
      n1 is numeric.

        Activity

        Amir Youssefi created issue -
        Hide
        Amir Youssefi added a comment -

        Workaround which disables combiner and makes script run successfully:

        A = load '...' USING PigStorage('\t') AS (c1, c2, c3, n1);
        B = group A by (c1,c2,c3);
        C = foreach B generate SUM(A.n1) , flatten(group);

        Show
        Amir Youssefi added a comment - Workaround which disables combiner and makes script run successfully: A = load '...' USING PigStorage('\t') AS (c1, c2, c3, n1); B = group A by (c1,c2,c3); C = foreach B generate SUM(A.n1) , flatten(group);
        Hide
        Olga Natkovich added a comment -

        Is this data dependent? I know I ran similar queries in the past without any problem.

        Show
        Olga Natkovich added a comment - Is this data dependent? I know I ran similar queries in the past without any problem.
        Hide
        Amir Youssefi added a comment -

        Yes.

        Show
        Amir Youssefi added a comment - Yes.
        Hide
        Olga Natkovich added a comment -

        we need a reproducible case to look into this. can you provide a fragment of data that is causing this problem?

        Show
        Olga Natkovich added a comment - we need a reproducible case to look into this. can you provide a fragment of data that is causing this problem?
        Alan Gates made changes -
        Field Original Value New Value
        Assignee Alan Gates [ alangates ]
        Hide
        Olga Natkovich added a comment -

        combiner code has been completely rewritten. Please, reopen if you case still fails

        Show
        Olga Natkovich added a comment - combiner code has been completely rewritten. Please, reopen if you case still fails
        Olga Natkovich made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Alan Gates made changes -
        Fix Version/s 0.2.0 [ 12313783 ]
        Alan Gates made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Alan Gates
            Reporter:
            Amir Youssefi
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development