Pig
  1. Pig
  2. PIG-3183

rm or rmf commands should respect globbing/regex of path

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.12.0
    • Component/s: grunt
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Release Note:
      Users should be encouraged to use "fs -rm *" if required. In general, using "fs" should yield the same behavior as hadoop.

      Description

      Hadoop fs commands support globbing during deleting files/dirs. Pig is not consistent with this behavior and seems like we could change rm/rmf commands to do the same.

      For eg:

      localhost:pig pkommireddi$ ls -ld out*
      drwxr-xr-x  12 pkommireddi  SF\domain users  408 Feb 13 01:09 out
      drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out1
      drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out2
      
      localhost:pig pkommireddi$ bin/pig -x local
      grunt> rmf out*
      grunt> quit
      
      localhost:pig pkommireddi$ ls -ld out*
      drwxr-xr-x  12 pkommireddi  SF\domain users  408 Feb 13 01:09 out
      drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out1
      drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out2
      

      Ideally, the user would expect "rmf out*" to delete all of the above dirs.

      1. PIG-3183.patch
        2 kB
        Prashant Kommireddi

        Activity

        Hide
        Prashant Kommireddi added a comment -

        Marking this "Won't fix" as using "fs" commands seems like the right approach going forward.

        Show
        Prashant Kommireddi added a comment - Marking this "Won't fix" as using "fs" commands seems like the right approach going forward.
        Hide
        Prashant Kommireddi added a comment -

        Right. I receive a lot of questions regarding "rm/ls" behavior and I point them to "fs" commands. But it's a pain for users to start using it and realize it doesn't work. I would be in favor of deprecating these or may be even translating it to "fs" under the hood.

        Show
        Prashant Kommireddi added a comment - Right. I receive a lot of questions regarding "rm/ls" behavior and I point them to "fs" commands. But it's a pain for users to start using it and realize it doesn't work. I would be in favor of deprecating these or may be even translating it to "fs" under the hood.
        Hide
        Cheolsoo Park added a comment -

        Prashant Kommireddi, please correct me if I am wrong.

        • As of now, these limitations can be easily worked around by using "fs" commands (i.e. fs -rm * and fs -ls *).
        • Given these commands are not documented (and thus not official), I would encourage users to use "fs" commands.

        I do not know why these commands are added in the first place, and we should keep them for backward compatibility for a while. But eventually I would like to get rid of them since they're duplicate IMO.

        Show
        Cheolsoo Park added a comment - Prashant Kommireddi , please correct me if I am wrong. As of now, these limitations can be easily worked around by using "fs" commands (i.e. fs -rm * and fs -ls *). Given these commands are not documented (and thus not official), I would encourage users to use "fs" commands. I do not know why these commands are added in the first place, and we should keep them for backward compatibility for a while. But eventually I would like to get rid of them since they're duplicate IMO.
        Hide
        Prashant Kommireddi added a comment -

        Jonathan Coveney or others have any comments?

        Show
        Prashant Kommireddi added a comment - Jonathan Coveney or others have any comments?
        Hide
        Prashant Kommireddi added a comment -

        The example above got jumbled up (was interpreted bold for *). Here it is again

        grunt> A = load 'input/*';
        
        vs
        
        grunt> ls 'input/*';
        
        Show
        Prashant Kommireddi added a comment - The example above got jumbled up (was interpreted bold for *). Here it is again grunt> A = load 'input/*'; vs grunt> ls 'input/*';
        Hide
        Prashant Kommireddi added a comment -

        Thanks for the review, Jonathan. I have a slightly different opinion on this, here is why:

        1. Users in the past who used "rm *" would know it does not work [category 1]
        2. These users could probably have a hack in place right now to solve this, OR would be completely avoiding a glob delete (as it doesn't work).
        3. For new users, this behavior is consistent with Hadoop in which case it should be ok [category 2]
        4. There has been no documentation around this, which means Pig users either fall in category 1 or 2. It would not make a lot of sense for users to have a "rm *" in their scripts as it never worked.
        5. Also, current behavior (rm or ls) is different from the way we accept globs in Loading files. There seems to be an inherent inconsistency in what users should expect in saying "A = load 'input/'" vs "ls input/".

        The reason I would prefer this being the default behavior is for above reasons, in addition to a lot of users not discovering it at all when its a config property.

        Let me know what you think? Others, please let us know what you think too.

        Show
        Prashant Kommireddi added a comment - Thanks for the review, Jonathan. I have a slightly different opinion on this, here is why: 1. Users in the past who used "rm *" would know it does not work [category 1] 2. These users could probably have a hack in place right now to solve this, OR would be completely avoiding a glob delete (as it doesn't work). 3. For new users, this behavior is consistent with Hadoop in which case it should be ok [category 2] 4. There has been no documentation around this, which means Pig users either fall in category 1 or 2. It would not make a lot of sense for users to have a "rm *" in their scripts as it never worked. 5. Also, current behavior (rm or ls) is different from the way we accept globs in Loading files. There seems to be an inherent inconsistency in what users should expect in saying "A = load 'input/ '" vs "ls input/ ". The reason I would prefer this being the default behavior is for above reasons, in addition to a lot of users not discovering it at all when its a config property. Let me know what you think? Others, please let us know what you think too.
        Hide
        Jonathan Coveney added a comment -

        So obviously this isn't a huge change and it is reasonable to make this work in the same way that hadoop's removal commands work. The only thing that holds me back is a sort of innate aversion to "rm *" being able to completely whack your files. Obviously in any environment this is generally the case but since in the past this wasn't the case, it just feels...dangerous. Perhaps we want to consider adding configuration (off by default) that makes it throw an error if you try to delete more than one file? Would love to hear if anyone else feels the hesitation that I do.

        Show
        Jonathan Coveney added a comment - So obviously this isn't a huge change and it is reasonable to make this work in the same way that hadoop's removal commands work. The only thing that holds me back is a sort of innate aversion to "rm *" being able to completely whack your files. Obviously in any environment this is generally the case but since in the past this wasn't the case, it just feels...dangerous. Perhaps we want to consider adding configuration (off by default) that makes it throw an error if you try to delete more than one file? Would love to hear if anyone else feels the hesitation that I do.
        Hide
        Prashant Kommireddi added a comment -

        Btw, this should probably be the case with "ls" as well. I will work on that in a separate JIRA.

        Show
        Prashant Kommireddi added a comment - Btw, this should probably be the case with "ls" as well. I will work on that in a separate JIRA.
        Hide
        Prashant Kommireddi added a comment -

        Ping!

        Show
        Prashant Kommireddi added a comment - Ping!
        Hide
        Prashant Kommireddi added a comment -

        Can someone please review this?

        Show
        Prashant Kommireddi added a comment - Can someone please review this?

          People

          • Assignee:
            Prashant Kommireddi
            Reporter:
            Prashant Kommireddi
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development