Pig
  1. Pig
  2. PIG-2666

LoadFunc.setLocation() is not called when pig script only has Order By

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.3
    • Fix Version/s: 0.9.3, 0.11, 0.10.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      HCatLoader.setLocation() needs setLocation() called on the frontend. This doesn't happen with this pig script:

      A = LOAD 'foo' USING org.apache.hcatalog.pig.HCatLoader();
      B = ORDER A BY id;
      DUMP B;

      1. PIG-2666-2.patch
        5 kB
        Daniel Dai
      2. PIG-2666-1.patch
        3 kB
        Daniel Dai
      3. PIG-2666-0.patch
        3 kB
        Daniel Dai

        Issue Links

          Activity

          Francis Liu created issue -
          Francis Liu made changes -
          Field Original Value New Value
          Link This issue relates to HCATALOG-380 [ HCATALOG-380 ]
          Hide
          Daniel Dai added a comment -

          Yes, we does not call setLocation in some cases. Specifically in JobControlCompiler:

          LoadFunc lf = ld.getLoadFunc();
          if (lf !=null) {
              lf.setLocation(ld.getLFile().getFileName(), nwJob);
          }
          

          It can be null when doing sampler job and order by job. we can solve it by forcing getLoadFunc() instantiate LoadFunc.

          Note in the design of LoadFunc, setLocation is not guaranteed to be called in the frontend. But seems now some LoadFunc depends on it. Shall we make the statement that setLocation should be called in the frontend?

          Show
          Daniel Dai added a comment - Yes, we does not call setLocation in some cases. Specifically in JobControlCompiler: LoadFunc lf = ld.getLoadFunc(); if (lf != null ) { lf.setLocation(ld.getLFile().getFileName(), nwJob); } It can be null when doing sampler job and order by job. we can solve it by forcing getLoadFunc() instantiate LoadFunc. Note in the design of LoadFunc, setLocation is not guaranteed to be called in the frontend. But seems now some LoadFunc depends on it. Shall we make the statement that setLocation should be called in the frontend?
          Hide
          Daniel Dai added a comment -

          Francis, can you try the patch?

          Show
          Daniel Dai added a comment - Francis, can you try the patch?
          Daniel Dai made changes -
          Attachment PIG-2666-0.patch [ 12524318 ]
          Hide
          Francis Liu added a comment -

          It would make sense to have something that guarantees to be get called on the FE so that there's a way to pass information to the inputformat as well as to the MR job. Not unless there's already something existing that does this?

          Show
          Francis Liu added a comment - It would make sense to have something that guarantees to be get called on the FE so that there's a way to pass information to the inputformat as well as to the MR job. Not unless there's already something existing that does this?
          Hide
          Daniel Dai added a comment -

          Yes, that's the motivation of the patch. It try to always call setLocation on FE.

          Show
          Daniel Dai added a comment - Yes, that's the motivation of the patch. It try to always call setLocation on FE.
          Hide
          Francis Liu added a comment -

          tested patch, still failing for me.

          Show
          Francis Liu added a comment - tested patch, still failing for me.
          Daniel Dai made changes -
          Assignee Daniel Dai [ daijy ]
          Hide
          Jonathan Coveney added a comment -

          In the short term, it seems fair to ensure setLocation is called on the frontend...in the (ideally near) future, though, we need to work on working some lifecycle into LoadFunc. The fact that there is a convention of stuffing a bunch of logic in setLocation is pretty confusing and suboptimal.

          Show
          Jonathan Coveney added a comment - In the short term, it seems fair to ensure setLocation is called on the frontend...in the (ideally near) future, though, we need to work on working some lifecycle into LoadFunc. The fact that there is a convention of stuffing a bunch of logic in setLocation is pretty confusing and suboptimal.
          Hide
          Francis Liu added a comment -

          In the short term, it seems fair to ensure setLocation is called on the frontend...in the (ideally near) future, though, we need to work on working some lifecycle into LoadFunc. The fact that there is a convention of stuffing a bunch of logic in setLocation is pretty confusing and suboptimal.

          That would be great. StoreFunc needs to be taken care of as well. I haven't used the other UDF interfaces.

          Show
          Francis Liu added a comment - In the short term, it seems fair to ensure setLocation is called on the frontend...in the (ideally near) future, though, we need to work on working some lifecycle into LoadFunc. The fact that there is a convention of stuffing a bunch of logic in setLocation is pretty confusing and suboptimal. That would be great. StoreFunc needs to be taken care of as well. I haven't used the other UDF interfaces.
          Hide
          Francis Liu added a comment -

          patch calls SampleLoader.setLocation->HCatLoader.setLocation and it does not propagate values set in UDFProperties

          Show
          Francis Liu added a comment - patch calls SampleLoader.setLocation->HCatLoader.setLocation and it does not propagate values set in UDFProperties
          Hide
          Daniel Dai added a comment -

          Can you try PIG-2666-1.patch?

          Show
          Daniel Dai added a comment - Can you try PIG-2666 -1.patch?
          Daniel Dai made changes -
          Attachment PIG-2666-1.patch [ 12526075 ]
          Hide
          Daniel Dai added a comment -

          Adding test case.

          Show
          Daniel Dai added a comment - Adding test case.
          Daniel Dai made changes -
          Attachment PIG-2666-2.patch [ 12526224 ]
          Hide
          Thejas M Nair added a comment -

          +1

          Show
          Thejas M Nair added a comment - +1
          Hide
          Daniel Dai added a comment -

          Patch committed to 0.9/0.10/trunk.

          Show
          Daniel Dai added a comment - Patch committed to 0.9/0.10/trunk.
          Daniel Dai made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 0.9.3 [ 12319456 ]
          Fix Version/s 0.11 [ 12318878 ]
          Fix Version/s 0.10.1 [ 12320547 ]
          Resolution Fixed [ 1 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Francis Liu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development