Pig
  1. Pig
  2. PIG-2134

ReadScalars message "scalar has more than one row in the output" does not provide enough information to help programmer find and fix script syntax error.

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: 0.8.0, 0.9.0
    • Fix Version/s: 0.15.0
    • Component/s: impl
    • Labels:

      Description

      (Bug filed on 0.8. I do not have 0.9 to test.)

      This applies to org.apache.pig.impl.builtin.ReadScalars.java:83

      http://search-hadoop.com/c/Pig:/src/org/apache/pig/impl/builtin/ReadScalars.java

      I have bitten myself with the same programming error several times, and each time I spent too long diagnosing my error.
      The error message "scalar has more than one row in the output" is a bit misleading, considering the underlying programming mistake.

      Consider this Pig script:

      A = LOAD 'a' as (key, a1, a_junk);
      B = LOAD 'b' as (key, b1, b_junk);
      C = join A by key, B by key;

      – Now, we want to project (key, a1, b1)
      – CORRECT:
      D_GOOD = foreach C generate A::key, a1, b1; – Disambiguate 'key' correctly.

      – Now, consider some common programmer errors:
      – INCORRECT:

      – This fails, because 'key' is ambiguous. The error message is clear enough.
      D_BAD_1 = foreach C generate key, a1, b1;

      – This fails whenever A has multiple rows.
      D_BAD_2 = foreach C generate A.key, a1, b1
      – Error: "Scalar has more than one row in the output 1st : t1, 2nd : t2"

      That's non-illuminating, for the following reason:

      The error message is assuming that the programmer is making a semantic error, trying to use a value from the original A, which is impossible if A has more than one row.

      In actuality, the programmer wants A::key, but he made a syntax error by typing "A.key", and it resulted in "scalar has more than one row" message that has nothing to do with what he intended.
      Since he has confused "." and "::", he has no context for interpreting the message properly.

      Ideally, the error message would say something like this:
      "A.key cannot be used as scalar here, because A has more than one row. Did you mean A::key?"

      If the identifiers are not available at error-logging time, something like this would be helpful:
      "Relation cannot be used as scalar here, because A has more than one row. Did you mean to use '::' instead of '.'? "

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          Mark it as duplicate. Thanks Niels!

          Show
          Daniel Dai added a comment - Mark it as duplicate. Thanks Niels!
          Hide
          Niels Basjes added a comment -

          I submitted a patch in PIG-4525 (committed yesterday) that clarifies the error.
          I think this issue can now be closed.

          Show
          Niels Basjes added a comment - I submitted a patch in PIG-4525 (committed yesterday) that clarifies the error. I think this issue can now be closed.
          Hide
          Michael Howard added a comment -

          We have a relatively small development organization of very experienced SQL developers who are in the process of learning/migrating-to Pig. Every one of us has been bitten by this, and have lost a lot of time because of it. We now try to emphasize the "disambiguation" :: operator to make sure everyone understands it.
          An error message which provides more guidance/clarity in this situation will benefit all budding Pig developers. Thank you.

          Show
          Michael Howard added a comment - We have a relatively small development organization of very experienced SQL developers who are in the process of learning/migrating-to Pig. Every one of us has been bitten by this, and have lost a lot of time because of it. We now try to emphasize the "disambiguation" :: operator to make sure everyone understands it. An error message which provides more guidance/clarity in this situation will benefit all budding Pig developers. Thank you.
          Hide
          Mariano Kamp added a comment -

          Thanks, Michael Brauwerman, same here. Was confused by the error message until I read your explanation.

          The fix would be great, but this turning up at Google as the first hit is almost as good

          Show
          Mariano Kamp added a comment - Thanks, Michael Brauwerman, same here. Was confused by the error message until I read your explanation. The fix would be great, but this turning up at Google as the first hit is almost as good
          Hide
          Yong Zhang added a comment -

          It confused me today for quite some time, especially I learned the pig for a short of time.

          Show
          Yong Zhang added a comment - It confused me today for quite some time, especially I learned the pig for a short of time.
          Hide
          Dmitriy V. Ryaboy added a comment -

          I think we should just straight up refuse to run when we see people refer to A.key when A::key exists. It's hard to imagine a situation where A.key is actually what is meant.

          Show
          Dmitriy V. Ryaboy added a comment - I think we should just straight up refuse to run when we see people refer to A.key when A::key exists. It's hard to imagine a situation where A.key is actually what is meant.
          Hide
          Daniel Granatshtein added a comment -

          Thanks for posting and the details on this issue!
          Searched for the error and hit this open ticket. Seeing the resolution, I imagine it would have taken some time to resolve myself.

          Show
          Daniel Granatshtein added a comment - Thanks for posting and the details on this issue! Searched for the error and hit this open ticket. Seeing the resolution, I imagine it would have taken some time to resolve myself.
          Hide
          Peter Lubell-Doughtie added a comment -

          I also encountered the same issue, this ticket accurately described the resolution.

          Show
          Peter Lubell-Doughtie added a comment - I also encountered the same issue, this ticket accurately described the resolution.
          Hide
          Stan Rosenberg added a comment -

          Is there any progress on this? I am wondering if this can be checked statically; in many cases the type of an alias can be inferred.

          Show
          Stan Rosenberg added a comment - Is there any progress on this? I am wondering if this can be checked statically; in many cases the type of an alias can be inferred.
          Hide
          Cheng Guang-Nan added a comment -

          Same problem here and thanks for filling this tickets.

          Show
          Cheng Guang-Nan added a comment - Same problem here and thanks for filling this tickets.
          Hide
          Jim Plush added a comment -

          This just bit me today for the exact reason you describe. Luckly your bug came up and I was able to resolve my dumb mistake, would have helped to have an error message like that though

          Show
          Jim Plush added a comment - This just bit me today for the exact reason you describe. Luckly your bug came up and I was able to resolve my dumb mistake, would have helped to have an error message like that though

            People

            • Assignee:
              Niels Basjes
              Reporter:
              Michael Brauwerman
            • Votes:
              8 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development