Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2

Oozie 'move' fs action is inconsistent

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • pre-Apache
    • None

    Description

      > Oozie 'move' fs action is inconsistent
      > --------------------------------------
      >
      > Key: OOZIE-133
      > URL: http://h12.grid.sp2.yahoo.net/browse/OOZIE-133
      > Project: oozie
      > Issue Type: New Feature
      > Components: workflow
      > Affects Versions: 3.0.2
      > Reporter: Mona Chitnis
      > Assignee: Oozie
      > Original Estimate: 1 week
      > Remaining Estimate: 1 week
      >
      > I'm using the 'move' fs action and I first got the following error:
      > FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]
      > when I had the following in my workflow.xml :
      > <fs>
      > <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
      > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
      > <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
      > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
      > </fs>
      > I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:
      > <fs>
      > <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
      > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
      > <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
      > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
      > </fs>
      > However, I now get this error:
      > FS003: Scheme [hdfs] not allowed in path
      > [hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]
      > it seems the 'scheme' is only needed for the source path, but not the target. This is inconsistent.
      > Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
      > exists. I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
      > placed under the target directory.


      [ http://h12.grid.sp2.yahoo.net/browse/OOZIE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

      Alejandro Abdelnur resolved OOZIE-133.
      --------------------------------------

      Resolution: Won't Fix

      This is not an bug, it is like that by design. The reason is to make clear it is a move within the current filesystem; no actual data movement.

      Mona Chitnis commented on OOZIE-133:
      ------------------------------------

      There are two parts to this issue.
      1. target not to mention scheme
      2. if target is an existing directory, exception thrown

      For part 1, if the target does include scheme, we can allow it but only if target's hdfs scheme is the same as source's (since move essentially incorporates a hadoop fs rename). This way users who have typed source and target paths both having the namenode parameter for the sake of consistency, do not face an exception.

      For part 2, hadoop can care of placing the source dir or file as a child of the target dir, if target dir exists. Is there any reason why oozie should not be consistent with this?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              chitnis Mona Chitnis
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 168h
                  168h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified