Pig
  1. Pig
  2. PIG-2498

e2e tests failing in some cases due to incorrect unix sort args

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.9.2
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None

      Description

      Some e2e tests are failing for me against 23 due to what I think are incorrect arguments to unix sort. For example in Order_6:

      			'num' => 6,
      			'pig' => q\a = load ':INPATH:/singlefile/studenttab10k';
      c = order a by $0;
      store c into ':OUTPATH:';\,
      			'sortArgs' => ['-t', '	', '+0', '-1'],
      

      The pig job is sorting by the first column, however unix sort is being told to sort by the first and second columns.

      From the gnu sort manual (specifically pos2 is inclusive): http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html

      '-k pos1[,pos2]'
      '--key=pos1[,pos2]'
      Specify a sort field that consists of the part of the line between pos1 and pos2 (or the end of the line, if pos2 is omitted), inclusive.
      
      ...
      
      On older systems, sort supports an obsolete origin-zero syntax '+pos1 [-pos2]' for specifying sort keys. The obsolete sequence 'sort +a.x -b.y' is equivalent to 'sort -k a+1.x+1,b' if y is '0' or absent, otherwise it is equivalent to 'sort -k a+1.x+1,b+1.y'.
      

      I verified this by running the sort manually with +0 -1 and +0 -0, in the first case it fails, in the second case it passes.

      1. PIG-2498.patch
        10 kB
        Patrick Hunt

        Issue Links

          Activity

          Patrick Hunt created issue -
          Hide
          Daniel Dai added a comment -

          All e2e tests runs OK in my environment. It might relate to sort stability issue. Feel free to submit a patch to make your test work.

          Show
          Daniel Dai added a comment - All e2e tests runs OK in my environment. It might relate to sort stability issue. Feel free to submit a patch to make your test work.
          Hide
          Patrick Hunt added a comment -

          @daniel - sounds like it. But you agree though that the test itself is flawed, correct? I'm not super familiar with this test and wanted to verify I was on the right track before creating the patch.

          Show
          Patrick Hunt added a comment - @daniel - sounds like it. But you agree though that the test itself is flawed, correct? I'm not super familiar with this test and wanted to verify I was on the right track before creating the patch.
          Hide
          Daniel Dai added a comment -

          I am not expert on sort syntax as well. But seems +0 -1 means start sorting on the first field, but stop before the second field, which means to sort on first column (http://www.softpanorama.org/Tools/sort.shtml). Does "-s" (stable sort) option help?

          Show
          Daniel Dai added a comment - I am not expert on sort syntax as well. But seems +0 -1 means start sorting on the first field, but stop before the second field, which means to sort on first column ( http://www.softpanorama.org/Tools/sort.shtml ). Does "-s" (stable sort) option help?
          Hide
          Patrick Hunt added a comment -

          Based on what I found in the gnu manual it means sort both on the first and second cols (-1 means the second col inclusive). ugly syntax (since deprecated it seems). The unix sort already is using the stable option. I think I know what to do. I'll work on a patch for review. Thanks!

          Show
          Patrick Hunt added a comment - Based on what I found in the gnu manual it means sort both on the first and second cols (-1 means the second col inclusive). ugly syntax (since deprecated it seems). The unix sort already is using the stable option. I think I know what to do. I'll work on a patch for review. Thanks!
          Hide
          Patrick Hunt added a comment -

          This patch fixes the unix sort issue I identified. I'm still trying to get a fully clean e2e test run though (due to other issues), so I'll hold off submitting until I see that.

          Show
          Patrick Hunt added a comment - This patch fixes the unix sort issue I identified. I'm still trying to get a fully clean e2e test run though (due to other issues), so I'll hold off submitting until I see that.
          Patrick Hunt made changes -
          Field Original Value New Value
          Attachment PIG-2498.patch [ 12513495 ]
          Patrick Hunt made changes -
          Assignee Patrick Hunt [ phunt ]
          Hide
          fang fang chen added a comment -

          +1
          Also encounter some tests failed with error "Sort check failed". With this patch, original failed tests passed now.

          Show
          fang fang chen added a comment - +1 Also encounter some tests failed with error "Sort check failed". With this patch, original failed tests passed now.
          Hide
          Rohini Palaniswamy added a comment -

          Patrick,
          Hit failures because of this in RHEL 6 for some test cases (Order-6,7,8,9,18, Types-20,21,22,23,24,25, Split-6, BigData-7,8). Came up with a patch by changing the failures to -k style, before I came upon this jira. Patch looks good, but I have one comment. Since we are fixing all the sort args, can we move off the obsolete origin-zero syntax and move to the -k style? I would be glad to review, test and commit this one. Thanks.

          Show
          Rohini Palaniswamy added a comment - Patrick, Hit failures because of this in RHEL 6 for some test cases (Order-6,7,8,9,18, Types-20,21,22,23,24,25, Split-6, BigData-7,8). Came up with a patch by changing the failures to -k style, before I came upon this jira. Patch looks good, but I have one comment. Since we are fixing all the sort args, can we move off the obsolete origin-zero syntax and move to the -k style? I would be glad to review, test and commit this one. Thanks.
          Hide
          Cheolsoo Park added a comment -

          Hi Rohini,

          I no longer see this issue on RHEL 6 after PIG-2782 is committed. Is this still an issue in 0.11/trunk?

          Thanks!

          Show
          Cheolsoo Park added a comment - Hi Rohini, I no longer see this issue on RHEL 6 after PIG-2782 is committed. Is this still an issue in 0.11/trunk? Thanks!
          Rohini Palaniswamy made changes -
          Link This issue duplicates PIG-2782 [ PIG-2782 ]
          Hide
          Rohini Palaniswamy added a comment -

          Thanks for pointing out that Cheolsoo. Missed that one. I was testing with 0.10. Marking this as duplicate.

          Show
          Rohini Palaniswamy added a comment - Thanks for pointing out that Cheolsoo. Missed that one. I was testing with 0.10. Marking this as duplicate.
          Rohini Palaniswamy made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 0.11 [ 12318878 ]
          Resolution Duplicate [ 3 ]
          Egil Sorensen made changes -
          Link This issue duplicates PIG-3045 [ PIG-3045 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          284d 3h 44m 1 Rohini Palaniswamy 10/Nov/12 21:59

            People

            • Assignee:
              Patrick Hunt
              Reporter:
              Patrick Hunt
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development