Bigtop
  1. Bigtop
  2. BIGTOP-1200

Implement Generic Text File to define HCFS filesystem semantics

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: Deployment
    • Labels:
      None

      Description

      One of the really useful artifacts in bigtop is the init-hdfs.sh script. It defines ecosystem semantics and expectations for hadoop clusters.

      Other HCFS filesystems can leverage the logic in this script quite easily, if we decouple its implementation from being HDFS specific by specifying a "SUPERUSER" parameter to replace "hdfs".

      And yes we can still have the init-hdfs.sh convenience script : which just calls "init-hcfs.sh hdfs" .

      Initial tests in puppet VMs pass. (attaching patch with this JIRA)

      [root@vagrant bigtop-puppet]# ./init-hdfs.sh 
      + echo 'Now initializing the Distributed File System with root=HDFS'
      Now initializing the Distributed File System with root=HDFS
      + ./init-hcfs.sh hdfs
      + '[' 1 -ne 1 ']'
      + SUPER_USER=hdfs
      + echo 'Initializing the DFS with super user : hdfs'
      Initializing the DFS with super user : hdfs
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /tmp'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1777 /tmp'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /var'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /var/log'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1775 /var/log'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown yarn:mapred /var/log'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /tmp/hadoop-yarn'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /tmp/hadoop-yarn'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /var/log/hadoop-yarn/apps'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /hbase'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hbase:hbase /hbase'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /solr'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown solr:solr /solr'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /benchmarks'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /benchmarks'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod 755 /user'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hdfs  /user'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/history'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown mapred:mapred /user/history'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod 755 /user/history'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/jenkins'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/jenkins'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown jenkins /user/jenkins'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/hive'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/hive'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hive /user/hive'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/root'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/root'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown root /user/root'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/hue'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/hue'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hue /user/hue'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/sqoop'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/sqoop'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown sqoop /user/sqoop'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/oozie'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown -R oozie /user/oozie'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie/share'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie/share/lib'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie/share/lib/hive'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie/share/lib/mapreduce-streaming'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie/share/lib/distcp'
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie/share/lib/pig'
      + ls '/usr/lib/hive/lib/*.jar'
      + ls /usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.6-alpha.jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar
      + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put /usr/lib/hadoop-mapreduce/hadoop-streaming*.jar /user/oozie/share/lib/mapreduce-streaming'
      put: `/user/oozie/share/lib/mapreduce-streaming/hadoop-streaming-2.0.6-alpha.jar': File exists
      put: `/user/oozie/share/lib/mapreduce-streaming/hadoop-streaming.jar': File exists
      [root@vagrant bigtop-puppet]# 
      
      

        Issue Links

          Activity

          Hide
          jay vyas added a comment -

          Bigtop init-HCFS patch . Tested in puppet, initially works.

          Show
          jay vyas added a comment - Bigtop init-HCFS patch . Tested in puppet, initially works.
          Hide
          Konstantin Boudnik added a comment -

          I would suggest to wait for a bit on the refactoring of init-hdfs.sh. Here's the reasoning:

          • init-hdfs.sh is very slow cause it requires a number of JVM startups via hadoop script
          • it is hard to use for a filesystem that has been partially initialed already (e.g. /tmp has been already created; in this case the script will fail)

          If instead we can simply push forward with BIGTOP-952 which is pretty much unblocked as BIGTOP-1097 ready for commit. Having direct calls to DFS APIs is beneficial on many levels and provide a much more powerful programming environment for all sorts of improvements.

          If the new functionality is so important, I think it can be achieved in a simpler way by extending init-hdfs.sh to either have command line argument or to read an env. var. for the super-user name. Current changes does seem to be too massive for such a small improvement, doesn't it?

          Show
          Konstantin Boudnik added a comment - I would suggest to wait for a bit on the refactoring of init-hdfs.sh. Here's the reasoning: init-hdfs.sh is very slow cause it requires a number of JVM startups via hadoop script it is hard to use for a filesystem that has been partially initialed already (e.g. /tmp has been already created; in this case the script will fail) If instead we can simply push forward with BIGTOP-952 which is pretty much unblocked as BIGTOP-1097 ready for commit. Having direct calls to DFS APIs is beneficial on many levels and provide a much more powerful programming environment for all sorts of improvements. If the new functionality is so important, I think it can be achieved in a simpler way by extending init-hdfs.sh to either have command line argument or to read an env. var. for the super-user name. Current changes does seem to be too massive for such a small improvement, doesn't it?
          Hide
          jay vyas added a comment -

          Thanks Cos for your feedback Now my turn to respond

          1) Purpose of this patch: Its good to decouple hadoop services from from hdfs semantics, wherever possible. This will pave the way for using bigtop to deploy more than just standard HDFS based hadoop services. Thats the main purpose of this patch. Passively, it also cleans up some stuff to incrementally improve issues like (2) below:

          2) Regarding the "partially initialized file systems" : That is a great point ! That is actually why we've put in "mkdir -p" instead of just "mkdir" as part of this patch . thus, the "partially initialized FS" problem is much more flexibly dealt with by the init-hcfs.sh script, than with the original init hDfs script.

          3) Regarding "very slow performance of init-hdfs": You are right that that your idea to use DFS direct APIs could be good for performance . This is synergistic with init-hcfs.sh.... By making a "generic" initi-hcfs.sh script (look closely at the patch, you will see that init-hdfs.sh is now much simpler), it paves the ways for you HDFS folks to create optimized HDFS path for file creation, but it also contributes an HCFS compliant alternative which the HCFS community can use with our bigtop based deployments.

          I think the 3 bullets above are a good start to an important debate that NEEDS to happen in the open.
          Lets please keep this debate going. The dialogue is probably just as important as the patch.

          • now in case thats not a compelling argument for this patch, heres an alternative approach *

          If you still feel that having init-hdfs.sh and init-hcfs.sh as side-by-side utilities is bad, then maybe i can add init-hcfs.sh into bigtop so that, from our side, the broader FileSYstem ecosystem (which ultimately will contirbute back and improve HDFS by contributing to the robustness of its interfaces and tests), we have a foothold in bigtop upon which we can innovate to further diversify the bigtop stack so that it can support a more diverse range of hadoop deployments.

          Show
          jay vyas added a comment - Thanks Cos for your feedback Now my turn to respond 1) Purpose of this patch: Its good to decouple hadoop services from from hdfs semantics, wherever possible. This will pave the way for using bigtop to deploy more than just standard HDFS based hadoop services. Thats the main purpose of this patch. Passively, it also cleans up some stuff to incrementally improve issues like (2) below: 2) Regarding the "partially initialized file systems" : That is a great point ! That is actually why we've put in "mkdir -p" instead of just "mkdir" as part of this patch . thus, the "partially initialized FS" problem is much more flexibly dealt with by the init-hcfs.sh script, than with the original init hDfs script. 3) Regarding "very slow performance of init-hdfs": You are right that that your idea to use DFS direct APIs could be good for performance . This is synergistic with init-hcfs.sh.... By making a "generic" initi-hcfs.sh script (look closely at the patch, you will see that init-hdfs.sh is now much simpler), it paves the ways for you HDFS folks to create optimized HDFS path for file creation, but it also contributes an HCFS compliant alternative which the HCFS community can use with our bigtop based deployments. I think the 3 bullets above are a good start to an important debate that NEEDS to happen in the open. Lets please keep this debate going. The dialogue is probably just as important as the patch. now in case thats not a compelling argument for this patch, heres an alternative approach * If you still feel that having init-hdfs.sh and init-hcfs.sh as side-by-side utilities is bad, then maybe i can add init-hcfs.sh into bigtop so that, from our side, the broader FileSYstem ecosystem (which ultimately will contirbute back and improve HDFS by contributing to the robustness of its interfaces and tests), we have a foothold in bigtop upon which we can innovate to further diversify the bigtop stack so that it can support a more diverse range of hadoop deployments.
          Hide
          jay vyas added a comment -

          As of the bigtop hackday, We've now agreed on a better way to dothis.

          1) The future of the standard bigtop deployment is BIGTOP-952 which is java / groovy based.

          2) However, we will encode the file system state in a text file which is language neutral. All the bigtop components will then read the state from that file.

          [8:53pm] jayunit100: We encode the filesystem state in a language neutral file, like JSON or plain text.
          [8:53pm] jayunit100: then have groovy read it and implement it.
          [8:53pm] cos1: good! In fact I like it quite a bit
          [8:53pm] jayunit100: Then other shops can take that and implement their own implementatino
          [8:53pm] jayunit100: yesss
          [8:53pm] cos1: super!
          [8:53pm] cos1: community at work, dudes!
          
          Show
          jay vyas added a comment - As of the bigtop hackday, We've now agreed on a better way to dothis. 1) The future of the standard bigtop deployment is BIGTOP-952 which is java / groovy based. 2) However, we will encode the file system state in a text file which is language neutral. All the bigtop components will then read the state from that file. [8:53pm] jayunit100: We encode the filesystem state in a language neutral file, like JSON or plain text. [8:53pm] jayunit100: then have groovy read it and implement it. [8:53pm] cos1: good! In fact I like it quite a bit [8:53pm] jayunit100: Then other shops can take that and implement their own implementatino [8:53pm] jayunit100: yesss [8:53pm] cos1: super! [8:53pm] cos1: community at work, dudes!
          Hide
          Konstantin Boudnik added a comment -

          Thanks for putting this all together jay vyas !
          One thing we need to think through is how to encode the permissions within the same structure.

          Show
          Konstantin Boudnik added a comment - Thanks for putting this all together jay vyas ! One thing we need to think through is how to encode the permissions within the same structure.
          Hide
          jay vyas added a comment -

          Heres a first pass at a JSON definition of filesystem semantics .

          Interested in Roman Shaposhnik's feedback (because hopefully it can be used in place of a tarball for BIGTOP-952) and Konstantin Boudnik's as well .

          Note : this isnt meant to entirely replace init-hdfs (anymore)... So I didnt translate the oozie specific parts of init-hdfs.sh .

          Show
          jay vyas added a comment - Heres a first pass at a JSON definition of filesystem semantics . Interested in Roman Shaposhnik 's feedback (because hopefully it can be used in place of a tarball for BIGTOP-952 ) and Konstantin Boudnik 's as well . Note : this isnt meant to entirely replace init-hdfs (anymore)... So I didnt translate the oozie specific parts of init-hdfs.sh .
          Hide
          Roman Shaposhnik added a comment -

          jay vyas json is a fine choice for a representation of a state that needs to be provisioned in HDFS. Thanks for working on this. A few comments:

          • not sure whether we need to have a special section for users
          • perhaps the best way to go about it is to create a groovy DSL for describing the desired state of HDFS. The advantage here is that you can include Groovy code (and hence take care of one-offs like provisioning users, etc.) The disadvantage, of course, is that JSON is much better known.

          Thoughts?

          Show
          Roman Shaposhnik added a comment - jay vyas json is a fine choice for a representation of a state that needs to be provisioned in HDFS. Thanks for working on this. A few comments: not sure whether we need to have a special section for users perhaps the best way to go about it is to create a groovy DSL for describing the desired state of HDFS. The advantage here is that you can include Groovy code (and hence take care of one-offs like provisioning users, etc.) The disadvantage, of course, is that JSON is much better known. Thoughts?
          Hide
          jay vyas added a comment -

          Thanks roman: For this artifact language independence is important.

          I think in the BIGTOP-952 jira we can add custom groovy stuff , but this jira attempts to create an artifact that both bigtop as well as other non bigtop deployments can use...

          And other folks who don't use groovy to provision can still use this json artifact to provision their HCFS clusters as well, since json is language independant.

          Does that make sense?

          Show
          jay vyas added a comment - Thanks roman: For this artifact language independence is important. I think in the BIGTOP-952 jira we can add custom groovy stuff , but this jira attempts to create an artifact that both bigtop as well as other non bigtop deployments can use... And other folks who don't use groovy to provision can still use this json artifact to provision their HCFS clusters as well, since json is language independant. Does that make sense?
          Hide
          Bruno Mahé added a comment -

          I would go with json as well

          Show
          Bruno Mahé added a comment - I would go with json as well
          Hide
          jay vyas added a comment -

          so, tl;dr ... Roman Shaposhnik are you okay with using json for this, and can we roll BIGTOP-952 to eventually use the pure json definitions as well? If so I think we can use this as a first iteration on a platform independant definition of the HCFS directory skeleton.

          Show
          jay vyas added a comment - so, tl;dr ... Roman Shaposhnik are you okay with using json for this, and can we roll BIGTOP-952 to eventually use the pure json definitions as well? If so I think we can use this as a first iteration on a platform independant definition of the HCFS directory skeleton.
          Hide
          Konstantin Boudnik added a comment - - edited

          I was just looking into the initial patch - btw you don't need to add suffixes to the patch names: JIRA is smart enough to provide conveniences around different versions of the attachments - and it looks overly good. Although nulls worry me a bit, but I guess this is Java in me talking. As I said before - JSON looks fine to me: it makes sense to use an language independent serialization that's universally understood by many environments.

          I see Roman Shaposhnik point about the DSL, but I think not everything is as flexible as Groovy in reading 3rd party formats.

          Show
          Konstantin Boudnik added a comment - - edited I was just looking into the initial patch - btw you don't need to add suffixes to the patch names: JIRA is smart enough to provide conveniences around different versions of the attachments - and it looks overly good. Although nulls worry me a bit, but I guess this is Java in me talking. As I said before - JSON looks fine to me: it makes sense to use an language independent serialization that's universally understood by many environments. I see Roman Shaposhnik point about the DSL, but I think not everything is as flexible as Groovy in reading 3rd party formats.
          Hide
          jay vyas added a comment - - edited

          Roman Shaposhnik can we push this through or still leave it open to debate? Im okay with closing the jira if we're not all on the same page. But would really like to see a platform neutral json file that we all can agree defines filesystem schema in a HCFS-friendly way !

          (sounds like its +1'd by you, right Konstantin Boudnik ?)

          Show
          jay vyas added a comment - - edited Roman Shaposhnik can we push this through or still leave it open to debate? Im okay with closing the jira if we're not all on the same page. But would really like to see a platform neutral json file that we all can agree defines filesystem schema in a HCFS-friendly way ! (sounds like its +1'd by you, right Konstantin Boudnik ?)
          Hide
          Konstantin Boudnik added a comment -

          Yup, I am ok with committing this. Do you already have an implementation of HCFS that uses it?

          Show
          Konstantin Boudnik added a comment - Yup, I am ok with committing this. Do you already have an implementation of HCFS that uses it?
          Hide
          jay vyas added a comment -

          Nope. Right now i still use the original init-hcfs.sh bash script which i submitted as the first patch, but then we began discussing BIGTOP-952, and the idea of moving AWAY from bash, so i proposed the JSON file.

          This will be a new artifact that all of us (both bigtop, and beyond), can share and maintain jointly as a community to maintain our hadoop ecosystems...

          Should i add a groovy script that uses this to provision the file system? Maybe i could have a look at BIGTOP-952 and see if i can get it working with this JSON ?

          Show
          jay vyas added a comment - Nope. Right now i still use the original init-hcfs.sh bash script which i submitted as the first patch, but then we began discussing BIGTOP-952 , and the idea of moving AWAY from bash, so i proposed the JSON file. This will be a new artifact that all of us (both bigtop, and beyond), can share and maintain jointly as a community to maintain our hadoop ecosystems... Should i add a groovy script that uses this to provision the file system? Maybe i could have a look at BIGTOP-952 and see if i can get it working with this JSON ?
          Hide
          Konstantin Boudnik added a comment -

          That'd be great if you have some spare cycles! Basically my only concern is if we find the proposed format insufficient we'll have to redo it again, etc. If there's a prove of concept implementation that uses the format it'd make me feel cozy

          Show
          Konstantin Boudnik added a comment - That'd be great if you have some spare cycles! Basically my only concern is if we find the proposed format insufficient we'll have to redo it again, etc. If there's a prove of concept implementation that uses the format it'd make me feel cozy
          Hide
          jay vyas added a comment -

          I can try to directly get BIGTOP-952 working, but need some advice regarding the way to properly glue groovy into the provisioning process. If you leave me a hint here https://issues.apache.org/jira/browse/BIGTOP-952 ?

          Show
          jay vyas added a comment - I can try to directly get BIGTOP-952 working, but need some advice regarding the way to properly glue groovy into the provisioning process. If you leave me a hint here https://issues.apache.org/jira/browse/BIGTOP-952 ?
          Hide
          jay vyas added a comment -

          Not fully tested, dont commit just yet.. just updating with latest patch which seems to be working.

          Note that we dont have all the logic in init-hdfs.sh, only some of it. Remaining stuff is below, and we will resolve how to handle it in BIGTOP-1235.

           
          # Copy over files from local filesystem to HDFS that oozie might need
          if ls /usr/lib/hive/lib/*.jar &> /dev/null; then
            su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put /usr/lib/hive/lib/*.jar /user/oozie/share/lib/hive'
          fi
          
          if ls /usr/lib/hadoop-mapreduce/hadoop-streaming*.jar &> /dev/null; then
            su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put /usr/lib/hadoop-mapreduce/hadoop-streaming*.jar /user/oozie/share/lib/mapreduce-streaming'
          fi
          
          if ls /usr/lib/hadoop-mapreduce/hadoop-distcp*.jar &> /dev/null; then
            su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put /usr/lib/hadoop-mapreduce/hadoop-distcp*.jar /user/oozie/share/lib/distcp'
          fi
          
          if ls /usr/lib/pig/{lib/,}*.jar &> /dev/null; then
            su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put /usr/lib/pig/{lib/,}*.jar /user/oozie/share/lib/pig'
          fi
          
          # Create home directory for the current user if it does not exist
          if [ "$1" = "-u" ] ; then
            USER="$2"
            USER=${USER:-$(id -un)}
            EXIST=$(su -s /bin/bash hdfs -c "/usr/bin/hadoop fs -ls /user/${USER}" &> /dev/null; echo $?)
            if [ ! $EXIST -eq 0 ]; then
              su -s /bin/bash hdfs -c "/usr/bin/hadoop fs -mkdir /user/${USER}"
              su -s /bin/bash hdfs -c "/usr/bin/hadoop fs -chmod -R 755 /user/${USER}"
              su -s /bin/bash hdfs -c "/usr/bin/hadoop fs -chown ${USER} /user/${USER}"
            fi
          fi
          
          Show
          jay vyas added a comment - Not fully tested, dont commit just yet.. just updating with latest patch which seems to be working. Note that we dont have all the logic in init-hdfs.sh, only some of it. Remaining stuff is below, and we will resolve how to handle it in BIGTOP-1235 . # Copy over files from local filesystem to HDFS that oozie might need if ls /usr/lib/hive/lib/*.jar &> /dev/null; then su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put /usr/lib/hive/lib/*.jar /user/oozie/share/lib/hive' fi if ls /usr/lib/hadoop-mapreduce/hadoop-streaming*.jar &> /dev/null; then su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put /usr/lib/hadoop-mapreduce/hadoop-streaming*.jar /user/oozie/share/lib/mapreduce-streaming' fi if ls /usr/lib/hadoop-mapreduce/hadoop-distcp*.jar &> /dev/null; then su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put /usr/lib/hadoop-mapreduce/hadoop-distcp*.jar /user/oozie/share/lib/distcp' fi if ls /usr/lib/pig/{lib/,}*.jar &> /dev/null; then su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put /usr/lib/pig/{lib/,}*.jar /user/oozie/share/lib/pig' fi # Create home directory for the current user if it does not exist if [ "$1" = "-u" ] ; then USER="$2" USER=${USER:-$(id -un)} EXIST=$(su -s /bin/bash hdfs -c "/usr/bin/hadoop fs -ls /user/${USER}" &> /dev/null; echo $?) if [ ! $EXIST -eq 0 ]; then su -s /bin/bash hdfs -c "/usr/bin/hadoop fs -mkdir /user/${USER}" su -s /bin/bash hdfs -c "/usr/bin/hadoop fs -chmod -R 755 /user/${USER}" su -s /bin/bash hdfs -c "/usr/bin/hadoop fs -chown ${USER} /user/${USER}" fi fi
          Hide
          jay vyas added a comment -

          okay cos, ive finished provisioning HDFS from this JSON and it seems to work good with BIGTOP-952.

          I think this patch is ready for commit !

          Let me know if any doubt.

          Show
          jay vyas added a comment - okay cos, ive finished provisioning HDFS from this JSON and it seems to work good with BIGTOP-952 . I think this patch is ready for commit ! Let me know if any doubt.
          Hide
          Konstantin Boudnik added a comment -

          so, this should go first, right?

          Show
          Konstantin Boudnik added a comment - so, this should go first, right?
          Hide
          jay vyas added a comment -

          Yes I guess so. I have VMs set up to refactor if any comments but in comfortable with this
          Along with the current BIGTOP-952 Jira as a first iteration on the new way of groovy based provisioning .

          Show
          jay vyas added a comment - Yes I guess so. I have VMs set up to refactor if any comments but in comfortable with this Along with the current BIGTOP-952 Jira as a first iteration on the new way of groovy based provisioning .
          Hide
          Konstantin Boudnik added a comment -

          Overly looks good. A few minor comments:

          • the file needs ASL 2.0 boiler plate
          • please use 2-4 indentation rule
          • here's a misaligned line
            +        [  "/user/oozie/share/lib", null,null,null ],
            +	[ "/user/oozie/share/lib/hive", null, null, null ],
            
          • I like when all elements of an array are stretched to a single line like here
            +        [   "/user/oozie/share", null,null,null ],
            
          • traditionally, at least in the security world, multi-tenant scenarios use names of alice and bob. Shall we stick to the same?
          Show
          Konstantin Boudnik added a comment - Overly looks good. A few minor comments: the file needs ASL 2.0 boiler plate please use 2-4 indentation rule here's a misaligned line + [ "/user/oozie/share/lib", null,null,null ], + [ "/user/oozie/share/lib/hive", null, null, null ], I like when all elements of an array are stretched to a single line like here + [ "/user/oozie/share", null,null,null ], traditionally, at least in the security world, multi-tenant scenarios use names of alice and bob . Shall we stick to the same?
          Hide
          jay vyas added a comment -

          Thanks again for taking the time to review all of this stuff ! I have a quick question before i submit this:

          How can I add ASL boiler plate to a JSON File?

          I've looked in hadoop-common and dont see it being done there. Have you ever put a license in a json file?

          Show
          jay vyas added a comment - Thanks again for taking the time to review all of this stuff ! I have a quick question before i submit this: How can I add ASL boiler plate to a JSON File? I've looked in hadoop-common and dont see it being done there. Have you ever put a license in a json file?
          Hide
          jay vyas added a comment -

          well, ive added a license field to the JSON file. Probably that is the simplest way to do it for now. In any case let me know if this looks better.
          Ive also updated the formatting and added in the "alice" user.

          FYI in the license i changed "s to 's. Hope that is okay.

          Show
          jay vyas added a comment - well, ive added a license field to the JSON file. Probably that is the simplest way to do it for now. In any case let me know if this looks better. Ive also updated the formatting and added in the "alice" user. FYI in the license i changed "s to 's. Hope that is okay.
          Hide
          Konstantin Boudnik added a comment -

          Wow, that was creative! I haven't realized that it might be a PITA in case of JSON - thanks for going an extra mile for that!

          One last nit (seriously): the use of white spaces in these lines

          +    ["/user/oozie/share", null,null,null ],
          +    ["/user/oozie/share/lib", null,null,null ],
          +    [ "/user/oozie/share/lib/hive", null, null, null ],
          +    ["/user/oozie/share/lib/mapreduce-streaming" , null, null , null ] ,
          +    ["/user/oozie/share/lib/distcp", null, null, null ],
          +    ["/user/oozie/share/lib/pig", null, null , null ]
          

          is different from the rest, e.g. space after comma; or space after trailing null. Not a big a deal I guess but would be nice to have it all the same, don't you think? Otherwise, I am really ready to put it in.

          Thanks for sticking with me though all these changes!

          Show
          Konstantin Boudnik added a comment - Wow, that was creative! I haven't realized that it might be a PITA in case of JSON - thanks for going an extra mile for that! One last nit (seriously): the use of white spaces in these lines + ["/user/oozie/share", null,null,null ], + ["/user/oozie/share/lib", null,null,null ], + [ "/user/oozie/share/lib/hive", null, null, null ], + ["/user/oozie/share/lib/mapreduce-streaming" , null, null , null ] , + ["/user/oozie/share/lib/distcp", null, null, null ], + ["/user/oozie/share/lib/pig", null, null , null ] is different from the rest, e.g. space after comma; or space after trailing null . Not a big a deal I guess but would be nice to have it all the same, don't you think? Otherwise, I am really ready to put it in. Thanks for sticking with me though all these changes!
          Hide
          jay vyas added a comment - - edited

          anything for you cos ............. looks good now?

          Show
          jay vyas added a comment - - edited anything for you cos ............. looks good now?
          Hide
          Konstantin Boudnik added a comment -

          Very nice! It looks real good. Will you be posting a description of the schema so the future adopters don't need to guess it?

          Show
          Konstantin Boudnik added a comment - Very nice! It looks real good. Will you be posting a description of the schema so the future adopters don't need to guess it?
          Hide
          jay vyas added a comment - - edited

          Hi cos. Great point: I've added this task as a bullet in BIGTOP-1235, which I think of as the "next" iteration on HCFS provisioning, which will allow us to improve all these tools, and ultimately to eliminate init-hdfs.sh entirely.
          Im a big fan cramming such docs into code where possible. So that if we update the schema its self-evident that the comment shoudl be updated as well.

          So in the next iteration we make it self describing with a multi-string "comment" field....

          "comment": 
              ["This is a schema file which describes the skeleton of a HCFS deployment",
               "There are two major sections: One for users, one for directories, and a special HCFS_SUPER_USER field",
               "Any hadoop vendor can use this schema to guide the initial directory creation for their hadoop deployments",
               ....
             ]
          

          (another alternative: i could create a README_HCFS file or something like that and commit it along side this patch... )

          Show
          jay vyas added a comment - - edited Hi cos. Great point: I've added this task as a bullet in BIGTOP-1235 , which I think of as the "next" iteration on HCFS provisioning, which will allow us to improve all these tools, and ultimately to eliminate init-hdfs.sh entirely. Im a big fan cramming such docs into code where possible. So that if we update the schema its self-evident that the comment shoudl be updated as well. So in the next iteration we make it self describing with a multi-string "comment" field.... "comment": ["This is a schema file which describes the skeleton of a HCFS deployment", "There are two major sections: One for users, one for directories, and a special HCFS_SUPER_USER field", "Any hadoop vendor can use this schema to guide the initial directory creation for their hadoop deployments", .... ] (another alternative: i could create a README_HCFS file or something like that and commit it along side this patch... )
          Hide
          Konstantin Boudnik added a comment - - edited

          I think keeping the schema description with the file is the way to go! Do you think you can add it right now instead of waiting for bigtop-1235? If not - I will commit it right away.

          Show
          Konstantin Boudnik added a comment - - edited I think keeping the schema description with the file is the way to go! Do you think you can add it right now instead of waiting for bigtop-1235? If not - I will commit it right away.
          Hide
          jay vyas added a comment -

          okay ill add it now

          Show
          jay vyas added a comment - okay ill add it now
          Hide
          jay vyas added a comment -

          Okay ! I think we are all set now. Ive updated with schema description and license as multiline strings, since JSON doesnt support multiline data.

          ive tested it again and it works with current BIGTOP-952 patch (Which is still under development and clean up).

          Show
          jay vyas added a comment - Okay ! I think we are all set now. Ive updated with schema description and license as multiline strings, since JSON doesnt support multiline data. ive tested it again and it works with current BIGTOP-952 patch (Which is still under development and clean up).
          Hide
          Konstantin Boudnik added a comment -

          +1 Comitting this! You're a patient man!

          Show
          Konstantin Boudnik added a comment - +1 Comitting this! You're a patient man!
          Hide
          jay vyas added a comment -

          thanks cos ! the broader HCFS community thanks you as well

          Show
          jay vyas added a comment - thanks cos ! the broader HCFS community thanks you as well
          Hide
          Konstantin Boudnik added a comment -

          Looks like I have some troubles committing to the git, which was working fine just that morning. Once it is resolved you patch will go in. And than broader community can thank me

          Show
          Konstantin Boudnik added a comment - Looks like I have some troubles committing to the git, which was working fine just that morning. Once it is resolved you patch will go in. And than broader community can thank me
          Hide
          Konstantin Boudnik added a comment -

          Committed to the master. Thanks Jay!

          Show
          Konstantin Boudnik added a comment - Committed to the master. Thanks Jay!

            People

            • Assignee:
              jay vyas
              Reporter:
              jay vyas
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development