Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.2
    • Fix Version/s: 0.2
    • Component/s: None
    • Labels:
      None
    • Environment:

      Java 6, Redhat, Debian

      Description

      For making HCatalog easy to use and match the software layout structure of Hadoop stack software, it would be nice to have installable package for HCatalog.

      1. HCATALOG-63-8-trunk.patch
        59 kB
        Sushanth Sowmyan
      2. HCATALOG-63-8-0.2.patch
        59 kB
        Sushanth Sowmyan
      3. HCATALOG-63-2.patch
        58 kB
        Eric Yang
      4. HCATALOG-63-1.patch
        57 kB
        Eric Yang
      5. HCATALOG-63.patch
        37 kB
        Eric Yang

        Activity

        Hide
        Ashutosh Chauhan added a comment -

        Patch committed to branch & 0.2. Thanks, Sushanth!

        Show
        Ashutosh Chauhan added a comment - Patch committed to branch & 0.2. Thanks, Sushanth!
        Hide
        Sushanth Sowmyan added a comment -

        Updating patches, last files were malformed.

        Show
        Sushanth Sowmyan added a comment - Updating patches, last files were malformed.
        Hide
        Sushanth Sowmyan added a comment - - edited

        Also, created separate jira (HCATALOG-105) for DEB creation, as we have not attended to that right now. (preliminary skeleton exists, but not developed further/verified)

        Show
        Sushanth Sowmyan added a comment - - edited Also, created separate jira ( HCATALOG-105 ) for DEB creation, as we have not attended to that right now. (preliminary skeleton exists, but not developed further/verified)
        Hide
        Sushanth Sowmyan added a comment -

        minor version type munging change

        < +  <property name="_vtype" value="dev"/> 
        < +  <!-- property name="_vtype" value="SNAPSHOT"/ --> 
        < +  <property name="hcatalog.version" value="${_version}-${_vtype}"/>
        ---
        > +  <property name="_vtype" value="-dev"/> 
        > +  <property name="hcatalog.version" value="${_version}${_vtype}"/>
        
        Show
        Sushanth Sowmyan added a comment - minor version type munging change < + <property name= "_vtype" value= "dev" /> < + <!-- property name= "_vtype" value= "SNAPSHOT" / --> < + <property name= "hcatalog.version" value= "${_version}-${_vtype}" /> --- > + <property name= "_vtype" value= "-dev" /> > + <property name= "hcatalog.version" value= "${_version}${_vtype}" />
        Hide
        Sushanth Sowmyan added a comment -

        minor version type munging change

        < +  <property name="_vtype" value="SNAPSHOT"/> 
        < +  <!-- property name="_vtype" value="SNAPSHOT"/ --> 
        < +  <property name="hcatalog.version" value="${_version}-${_vtype}"/>
        ---
        > +  <property name="_vtype" value="-SNAPSHOT"/> 
        > +  <property name="hcatalog.version" value="${_version}${_vtype}"/>
        
        Show
        Sushanth Sowmyan added a comment - minor version type munging change < + <property name= "_vtype" value= "SNAPSHOT" /> < + <!-- property name= "_vtype" value= "SNAPSHOT" / --> < + <property name= "hcatalog.version" value= "${_version}-${_vtype}" /> --- > + <property name= "_vtype" value= "-SNAPSHOT" /> > + <property name= "hcatalog.version" value= "${_version}${_vtype}" />
        Hide
        Sushanth Sowmyan added a comment -

        Per suggestion, Updated patches to print out which port the metastore started up on:

        $ sudo service hcatalog-server restart
        Stopping HCatalog Server daemon (hcatalog-server): looking for /var/run/hcatalog/hcat.pid
        Found metastore server process 19443, killing...
        Successfully shutdown metastore
                                                                   [  OK  ]
        Starting HCatalog Server daemon (hcatalog-server): Started metastore server init, testing if initialized correctly...
        Metastore initialized successfully on port[9933].
                                                                   [  OK  ]
        
        Show
        Sushanth Sowmyan added a comment - Per suggestion, Updated patches to print out which port the metastore started up on: $ sudo service hcatalog-server restart Stopping HCatalog Server daemon (hcatalog-server): looking for / var /run/hcatalog/hcat.pid Found metastore server process 19443, killing... Successfully shutdown metastore [ OK ] Starting HCatalog Server daemon (hcatalog-server): Started metastore server init, testing if initialized correctly... Metastore initialized successfully on port[9933]. [ OK ]
        Hide
        Sushanth Sowmyan added a comment -

        Update to trunk to print out port number on starting.

        Show
        Sushanth Sowmyan added a comment - Update to trunk to print out port number on starting.
        Hide
        Sushanth Sowmyan added a comment -

        Update to 0.2 to print out port number on starting.

        Show
        Sushanth Sowmyan added a comment - Update to 0.2 to print out port number on starting.
        Hide
        Sushanth Sowmyan added a comment -

        Created HCATALOG-103 for updating documentation with the rpm-based installation, created HCATALOG-104 for splitting out hcatalog and hcatalog-server to use independent conf/etc dirs as per suggestions.

        Show
        Sushanth Sowmyan added a comment - Created HCATALOG-103 for updating documentation with the rpm-based installation, created HCATALOG-104 for splitting out hcatalog and hcatalog-server to use independent conf/etc dirs as per suggestions.
        Hide
        Sushanth Sowmyan added a comment -

        Oh, btw, with the default changing from `whoami` to "hcat", service works as well.

        sudo service hcatalog-server start
        Starting HCatalog Server daemon (hcatalog-server): Started metastore server init, testing if initialized correctly...
        Metastore initialized successfully.
                                                                   [  OK  ]
        
        Show
        Sushanth Sowmyan added a comment - Oh, btw, with the default changing from `whoami` to "hcat", service works as well. sudo service hcatalog-server start Starting HCatalog Server daemon (hcatalog-server): Started metastore server init, testing if initialized correctly... Metastore initialized successfully. [ OK ]
        Hide
        Sushanth Sowmyan added a comment -

        0.2-branch patch with updates to
        a) Copying proto-hive-site.xml
        b) Asking user to verify conf files after copying it in
        c) Defaulting to user "hcat" in conf

        Show
        Sushanth Sowmyan added a comment - 0.2-branch patch with updates to a) Copying proto-hive-site.xml b) Asking user to verify conf files after copying it in c) Defaulting to user "hcat" in conf
        Hide
        Sushanth Sowmyan added a comment -

        Trunk patch with updates to
        a) Copying proto-hive-site.xml
        b) Asking user to verify conf files after copying it in
        c) Defaulting to user "hcat" in conf

        Show
        Sushanth Sowmyan added a comment - Trunk patch with updates to a) Copying proto-hive-site.xml b) Asking user to verify conf files after copying it in c) Defaulting to user "hcat" in conf
        Hide
        Ashutosh Chauhan added a comment -

        not copying proto-hive-site.xml was a way I figured we could enforce the user needing to take this action.

        This won't enforce anything because if they don't rename it, it won't be read it all and system won't complain and it will fail later with other problems. I recommend renaming.

        Other potential areas of improvement:

        • This should be integrated with 'service' provided by redhat based systems which is more common way of managing daemons on these systems.
        • client and server should have different config locations, sharing /etc/hcatalog by both is not ideal.
        Show
        Ashutosh Chauhan added a comment - not copying proto-hive-site.xml was a way I figured we could enforce the user needing to take this action. This won't enforce anything because if they don't rename it, it won't be read it all and system won't complain and it will fail later with other problems. I recommend renaming. Other potential areas of improvement: This should be integrated with 'service' provided by redhat based systems which is more common way of managing daemons on these systems. client and server should have different config locations, sharing /etc/hcatalog by both is not ideal.
        Hide
        Sushanth Sowmyan added a comment -
        • Patch provided for 0.2 branch
        • Possibly, although I went with the convention we use for the binaries (we call it hcat and not hcat_client)
        • That is provided as a convenience for those who start hcat_server, i.e., if I wanted to run hcat_server as user hcatsvr, if I did a sudo -u hcatsvr before I started the server, it'd start as that user. Eric added that and I kept it because I think it's potentially useful. A user can choose to override if they want. Although... I'd like to see where you get that "cannot set groups" error - I did not come across that.
        • The renaming of proto-hive-site.xml : I considered this, and it might have made sense to. But there are settings in hive-site.xml (the caps ones) that we want the user to override, and knowingly so. Unless we have a script to use to set settings in config xml files (I need to look into the template script a bit more - maybe it does this too?), it's not easy to automate. So, not copying proto-hive-site.xml was a way I figured we could enforce the user needing to take this action. (The db installation step, and setting the jdbc url comes to mind on the non-automatable-by-us end)
        • Hmm. Good point, agreed, should be part of a post-install script, and should spit out the above instructions too.
        Show
        Sushanth Sowmyan added a comment - Patch provided for 0.2 branch Possibly, although I went with the convention we use for the binaries (we call it hcat and not hcat_client) That is provided as a convenience for those who start hcat_server, i.e., if I wanted to run hcat_server as user hcatsvr, if I did a sudo -u hcatsvr before I started the server, it'd start as that user. Eric added that and I kept it because I think it's potentially useful. A user can choose to override if they want. Although... I'd like to see where you get that "cannot set groups" error - I did not come across that. The renaming of proto-hive-site.xml : I considered this, and it might have made sense to. But there are settings in hive-site.xml (the caps ones) that we want the user to override, and knowingly so. Unless we have a script to use to set settings in config xml files (I need to look into the template script a bit more - maybe it does this too?), it's not easy to automate. So, not copying proto-hive-site.xml was a way I figured we could enforce the user needing to take this action. (The db installation step, and setting the jdbc url comes to mind on the non-automatable-by-us end) Hmm. Good point, agreed, should be part of a post-install script, and should spit out the above instructions too.
        Hide
        Sushanth Sowmyan added a comment -

        Adding patch for 0.2 branch, didn't apply cleanly because the build.xml for 0.2 had diverged in the version number, supplied file HCATALOG-63-5-0.2.patch for it. (We shouldn't normally need separate patches, it's just that we changed 2 lines that set the version number, and that's different in 0.2)

        Show
        Sushanth Sowmyan added a comment - Adding patch for 0.2 branch, didn't apply cleanly because the build.xml for 0.2 had diverged in the version number, supplied file HCATALOG-63 -5-0.2.patch for it. (We shouldn't normally need separate patches, it's just that we changed 2 lines that set the version number, and that's different in 0.2)
        Hide
        Ashutosh Chauhan added a comment -
        • Patch doesn't apply to 0.2 branch
        • To be more clear, should client rpm be named hcatalog-client-0.3.0-1.i386.rpm instead of hcatalog-0.3.0-1.i386.rpm ?
        • /etc/hcatalog/hcat-env.sh has USER=`whoami` . Should that have already been replaced by value of `whoami` after installation.
        • $ service hcatalog-server start
          Starting HCatalog Server daemon (hcatalog-server): runuser: cannot set groups: Operation not permitted
                                                                     [FAILED]
          

          Do I need to do some other step in between ?

        • Shouldn't renaming of proto-hive-site.xml to hive-site.xml should happen as part of installation ?
        • It seems post installation administrator has to hand-edit few files. We should ask a user to only edit one file (in non-interactive mode or ask interactively) and then fill up the details in all different config files as part of installation.
        • If you are asking for your user to verify something, it should also be printed with current values on screen at the end of installation so she can verify it.
        Show
        Ashutosh Chauhan added a comment - Patch doesn't apply to 0.2 branch To be more clear, should client rpm be named hcatalog-client-0.3.0-1.i386.rpm instead of hcatalog-0.3.0-1.i386.rpm ? /etc/hcatalog/hcat-env.sh has USER=`whoami` . Should that have already been replaced by value of `whoami` after installation. $ service hcatalog-server start Starting HCatalog Server daemon (hcatalog-server): runuser: cannot set groups: Operation not permitted [FAILED] Do I need to do some other step in between ? Shouldn't renaming of proto-hive-site.xml to hive-site.xml should happen as part of installation ? It seems post installation administrator has to hand-edit few files. We should ask a user to only edit one file (in non-interactive mode or ask interactively) and then fill up the details in all different config files as part of installation. If you are asking for your user to verify something, it should also be printed with current values on screen at the end of installation so she can verify it.
        Hide
        Sushanth Sowmyan added a comment -

        Another couple of miscellaneous install/run instructions

        Run time instructions:
        a) We need to export HADOOP_HOME before we can run hcat.
        b) We need to export HIVE_CONF_DIR to /etc/hcatalog (or wherever we've installed hcat's config before we can run the 'hive' commandline ('hcat' cmdline works okay)

        Install/Deploy-time instructions:
        a) DB connector jar : we need a jdbc connector jar that we use for connecting to whatever db we're using, and we also have to make sure a database server is externally set up for use. This needs to be reflected in hcat-env.sh (typically in /etc/hcatalog/) in the DBROOT parameter. ( for eg:, I have DBROOT=/home/sush/opt/mysql-connector-java-3.1.14 , and I installed it there from the tarball provided by mysql. No official rpm provided, and also license incompatible if provided for distributing, but one can set that up themselves to whatever db they desire)
        b) cp /etc/hcatalog/proto-hive-site.xml /etc/hcatalog/hive-site.xml , hand-edit and replace parameters in allcaps to whatever value makes sense in that installation (DBHOSTNAME,PASSWORD,WAREHOUSE_DIR,KEYTAB_PATH) and/or hand-edit other parameters that are desired (jdbc connection url, sasl_enabled, etc)
        c) verify HADOOP_HOME,DBROOT and USER in /etc/hcat-env.sh is to deployer's satisfaction.

        Show
        Sushanth Sowmyan added a comment - Another couple of miscellaneous install/run instructions Run time instructions: a) We need to export HADOOP_HOME before we can run hcat. b) We need to export HIVE_CONF_DIR to /etc/hcatalog (or wherever we've installed hcat's config before we can run the 'hive' commandline ('hcat' cmdline works okay) Install/Deploy-time instructions: a) DB connector jar : we need a jdbc connector jar that we use for connecting to whatever db we're using, and we also have to make sure a database server is externally set up for use. This needs to be reflected in hcat-env.sh (typically in /etc/hcatalog/) in the DBROOT parameter. ( for eg:, I have DBROOT=/home/sush/opt/mysql-connector-java-3.1.14 , and I installed it there from the tarball provided by mysql. No official rpm provided, and also license incompatible if provided for distributing, but one can set that up themselves to whatever db they desire) b) cp /etc/hcatalog/proto-hive-site.xml /etc/hcatalog/hive-site.xml , hand-edit and replace parameters in allcaps to whatever value makes sense in that installation (DBHOSTNAME,PASSWORD,WAREHOUSE_DIR,KEYTAB_PATH) and/or hand-edit other parameters that are desired (jdbc connection url, sasl_enabled, etc) c) verify HADOOP_HOME,DBROOT and USER in /etc/hcat-env.sh is to deployer's satisfaction.
        Hide
        Sushanth Sowmyan added a comment -

        1. To build the rpm package, we do the following :

        > ant rpm -Dforrest.home=$FORREST_HOME

        (For this, you do need to have apache forrest installed)

        2. The rpm package build builds on top of a tarball build, so we still keep that capability, and we still want to support tarball installs - not all systems can install rpms. It would make sense to modify those scripts to support tarball installs too. But for an rpm-ed package, we simply use rpm to install, and don't use these scripts.

        Show
        Sushanth Sowmyan added a comment - 1. To build the rpm package, we do the following : > ant rpm -Dforrest.home=$FORREST_HOME (For this, you do need to have apache forrest installed) 2. The rpm package build builds on top of a tarball build, so we still keep that capability, and we still want to support tarball installs - not all systems can install rpms. It would make sense to modify those scripts to support tarball installs too. But for an rpm-ed package, we simply use rpm to install, and don't use these scripts.
        Hide
        Ashutosh Chauhan added a comment -

        @Sushanth,

        1. For document purposes, can you provide instructions on how to build rpm packages and install hcat server and client?
        2. What about scripts in scripts/* ? Going forward are we planning to support tarball installation through those scripts as well or just rpm installation is recommended ?

        Show
        Ashutosh Chauhan added a comment - @Sushanth, 1. For document purposes, can you provide instructions on how to build rpm packages and install hcat server and client? 2. What about scripts in scripts/* ? Going forward are we planning to support tarball installation through those scripts as well or just rpm installation is recommended ?
        Hide
        Sushanth Sowmyan added a comment -

        Updated patch to reconcile with HCATALOG-43, Metastore port change fix included. : HCATALOG-63.5.patch

        Show
        Sushanth Sowmyan added a comment - Updated patch to reconcile with HCATALOG-43 , Metastore port change fix included. : HCATALOG-63 .5.patch
        Hide
        Ashutosh Chauhan added a comment -

        Because of HCATALOG-43 patch doesn't apply.

        Show
        Ashutosh Chauhan added a comment - Because of HCATALOG-43 patch doesn't apply.
        Hide
        Sushanth Sowmyan added a comment -

        Updated patch to latest version, still has an issue with METASTORE_PORT updates, works with 9083 (the default port).

        Show
        Sushanth Sowmyan added a comment - Updated patch to latest version, still has an issue with METASTORE_PORT updates, works with 9083 (the default port).
        Hide
        Eric Yang added a comment -

        Updated the location to locate Hive through $

        {HIVE_HOME}

        /bin/hive because Hive does not have rpm/deb package for integration yet.

        Show
        Eric Yang added a comment - Updated the location to locate Hive through $ {HIVE_HOME} /bin/hive because Hive does not have rpm/deb package for integration yet.
        Hide
        Eric Yang added a comment -

        Outstanding issue, HCatalog depends on hive, but hive is not structured to the new file system layout. Therefore, the patch for hcatalog server can not locate hive jar files. Could we move the hive organization to another jira? Thanks

        Show
        Eric Yang added a comment - Outstanding issue, HCatalog depends on hive, but hive is not structured to the new file system layout. Therefore, the patch for hcatalog server can not locate hive jar files. Could we move the hive organization to another jira? Thanks
        Hide
        Eric Yang added a comment -

        Include server side packaging and server side scripts are model after hadoop scripts. hcat-config.sh for setup environment, and hcat_server.sh

        Show
        Eric Yang added a comment - Include server side packaging and server side scripts are model after hadoop scripts. hcat-config.sh for setup environment, and hcat_server.sh
        Hide
        Eric Yang added a comment -

        More work to merge server side script to packaging.

        Show
        Eric Yang added a comment - More work to merge server side script to packaging.
        Hide
        Ashutosh Chauhan added a comment -

        1. This only creates a client packages. We need a hcat server package too. Once we have server package, we can get rid off all the sh scripts in scripts/.
        2. This patch will probably break scripts/hcat_server_install.sh but since we will get corresponding server packages, we won't need them anymore.
        3. For deb package lets not put dependency on openjdk because hadoop security depends on sun-jdk itself.
        4. To install this, user must be a member of hadoop group, which I think should not be required. It should be possible to install client package without being superuser.

        Show
        Ashutosh Chauhan added a comment - 1. This only creates a client packages. We need a hcat server package too. Once we have server package, we can get rid off all the sh scripts in scripts/. 2. This patch will probably break scripts/hcat_server_install.sh but since we will get corresponding server packages, we won't need them anymore. 3. For deb package lets not put dependency on openjdk because hadoop security depends on sun-jdk itself. 4. To install this, user must be a member of hadoop group, which I think should not be required. It should be possible to install client package without being superuser.
        Hide
        Eric Yang added a comment -

        Match file system layout as Hadoop 0.20.204/0.23+, see HADOOP-6255 for proposed layout. For building debian packages use:

        ant -Dforrest.home=/path/to/apache-forrest-0.9 clean deb
        

        For building redhat package use:

        ant -Dforrest.home=/path/to/apache-forrest-0.9 clean rpm
        
        Show
        Eric Yang added a comment - Match file system layout as Hadoop 0.20.204/0.23+, see HADOOP-6255 for proposed layout. For building debian packages use: ant -Dforrest.home=/path/to/apache-forrest-0.9 clean deb For building redhat package use: ant -Dforrest.home=/path/to/apache-forrest-0.9 clean rpm

          People

          • Assignee:
            Sushanth Sowmyan
            Reporter:
            Eric Yang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development