Hive
  1. Hive
  2. HIVE-2928

Support for Oracle-backed Hive-Metastore ("longvarchar" to "clob" in package.jdo)

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.1, 0.10.0
    • Component/s: Metastore
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I'm trying to get the Hive-Metastore to work when backed by an Oracle backend. There's a change to hive's package.jdo that I'd like advice/comments on.

      One sticking point on working with Oracle has been the TBLS table (MTable) and its 2 LONGVARCHAR properties (VIEW_ORIGINAL_TEXT and VIEW_EXPANDED_TEXT). Oracle doesn't support more than one LONGVARCHAR property per table (for reason of legacy), and prefers that one use CLOBs instead. If one switches to CLOB properties, with no modification to hive's package.jdo, one sees the following exception:

      <quote>
      Incompatible data type for column TBLS.VIEW_EXPANDED_TEXT : was CLOB
      (datastore), but type expected was LONGVARCHAR (metadata). Please check that
      the type in the datastore and the type specified in the MetaData are
      consistent.
      org.datanucleus.store.rdbms.exceptions.IncompatibleDataTypeException:
      Incompatible data type for column TBLS.VIEW_EXPANDED_TEXT : was CLOB
      (datastore), but type expected was LONGVARCHAR (metadata). Please check that
      the type in the datastore and the type specified in the MetaData are
      consistent.
      at
      org.datanucleus.store.rdbms.table.ColumnImpl.validate(ColumnImpl.java:521)
      at
      org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:2
      </quote>

      But if one rebuilds Hive with the package.jdo changed to use CLOBs instead of LONGVARCHARs, things look promising:
      1. The exception no longer occurs. Things seem to work with Oracle. (I've yet to scale-test.)
      2. These modified hive-libraries work as is with pre-existing mysql metastores. Migrating data isn't a worry.
      3. The unit-tests seem to run through.

      Would there be opposition to changing the package.jdo's LONGVARCHAR references to CLOB, if this works with mysql and with Oracle?

      Mithun

      P.S. I also have a working hive-schema-0.9.0-oracle.sql script that I'm testing, for the related issue of creating the required tables in Oracle.

      1. HIVE-2928.patch
        21 kB
        Mithun Radhakrishnan
      2. HIVE-2928-fixed-path.diff.txt
        21 kB
        Andrew Bayer

        Issue Links

          Activity

          Hide
          Mithun Radhakrishnan added a comment -

          Schema-creation in Oracle. + Changes to package.jdo ("longvarchar" to "clob").

          Show
          Mithun Radhakrishnan added a comment - Schema-creation in Oracle. + Changes to package.jdo ("longvarchar" to "clob").
          Hide
          Ashutosh Chauhan added a comment -

          2. These modified hive-libraries work as is with pre-existing mysql metastores. Migrating data isn't a worry.

          You are changing type of columns. This sounds like requiring data migration and thus data migration and schema upgrade scripts for mysql and derby.

          Show
          Ashutosh Chauhan added a comment - 2. These modified hive-libraries work as is with pre-existing mysql metastores. Migrating data isn't a worry. You are changing type of columns. This sounds like requiring data migration and thus data migration and schema upgrade scripts for mysql and derby.
          Hide
          Mithun Radhakrishnan added a comment -

          @Ashutosh: The data-type change is in the package.jdo. I've verified that hive-jars built with CLOB work against a pre-existing MySQL metastore-server (whose datatypes haven't been changed). I didn't have to migrate.

          Show
          Mithun Radhakrishnan added a comment - @Ashutosh: The data-type change is in the package.jdo. I've verified that hive-jars built with CLOB work against a pre-existing MySQL metastore-server (whose datatypes haven't been changed). I didn't have to migrate.
          Hide
          Mithun Radhakrishnan added a comment -

          (Just re-verified that this is the case.)

          Show
          Mithun Radhakrishnan added a comment - (Just re-verified that this is the case.)
          Hide
          Carl Steinbach added a comment -

          @Mithun: What happens on MySQL if you turn column validation on (e.g. datanucleus.validateColumns=true at startup)? There's a note on the Datanucleus site that indicates this won't work:

          http://www.datanucleus.org/products/datanucleus/rdbms/support.html

          You can specify "BLOB", "CLOB" JDBC types when using MySQL with DataNucleus but you must turn validation of columns OFF. This is because these types are not supported by the MySQL JDBC driver and it returns them as LONGVARBINARY/LONGVARCHAR when querying the column type.

          Show
          Carl Steinbach added a comment - @Mithun: What happens on MySQL if you turn column validation on (e.g. datanucleus.validateColumns=true at startup)? There's a note on the Datanucleus site that indicates this won't work: http://www.datanucleus.org/products/datanucleus/rdbms/support.html You can specify "BLOB", "CLOB" JDBC types when using MySQL with DataNucleus but you must turn validation of columns OFF. This is because these types are not supported by the MySQL JDBC driver and it returns them as LONGVARBINARY/LONGVARCHAR when querying the column type.
          Hide
          Mithun Radhakrishnan added a comment -

          @Carl: Thank you for the pointer.

          Would your suggestion be that changing the datatype in Hive's pacakge.jdo won't be acceptable because there might be deployments with MySQL with JDO column-validations turned on?

          Show
          Mithun Radhakrishnan added a comment - @Carl: Thank you for the pointer. Would your suggestion be that changing the datatype in Hive's pacakge.jdo won't be acceptable because there might be deployments with MySQL with JDO column-validations turned on?
          Hide
          Carl Steinbach added a comment -

          We currently have JDO schema validation disabled by default since it affects performance. However, if modifying Hive's package.jdo to work with Oracle means that we're forfeiting this feature, I'd at least like to know that up front.

          Also, there was an exchange a couple years ago on the hive-user list where John made an interesting suggestion:

          http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201006.mbox/%3CBD1FE08F-5EDF-4D90-A741-8B703CD06BC1@facebook.com%3E

          Another option is to precreate your schema in Oracle and then tell JDO not to try to create/update

          it automatically.

          Would you mind trying this out to see if it works? If it does then I think that might be the optimal solution for now.

          Show
          Carl Steinbach added a comment - We currently have JDO schema validation disabled by default since it affects performance. However, if modifying Hive's package.jdo to work with Oracle means that we're forfeiting this feature, I'd at least like to know that up front. Also, there was an exchange a couple years ago on the hive-user list where John made an interesting suggestion: http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201006.mbox/%3CBD1FE08F-5EDF-4D90-A741-8B703CD06BC1@facebook.com%3E Another option is to precreate your schema in Oracle and then tell JDO not to try to create/update it automatically. Would you mind trying this out to see if it works? If it does then I think that might be the optimal solution for now.
          Hide
          Mithun Radhakrishnan added a comment -

          Updated to remove package.jdo change.

          Show
          Mithun Radhakrishnan added a comment - Updated to remove package.jdo change.
          Hide
          Mithun Radhakrishnan added a comment -

          @Carl: Thanks. I'd seen that thread before, but I hadn't actually tested that out myself. I just did, and here's what I found out:

          1. If the complete schema is created apriori (using the hive-schema-0.9.0.oracle.sql in the attached patch), then hive works with Oracle. (This is with "datanucleus.validateColumns = false".)
          2. What's neat is that "datanucleus.autoCreateSchema = true" doesn't mess this up, because the schema is completely constructed. The exception in the Original Description was a result of there being schema differences that JDO attempted to resolve.

          (For the record, Datanucleus does recommend turning both flags off, for the sake of performance. They're meant to be used for the first start-up.)

          The offshoot of this would be that any changes in the schema would have to be resolved using a migration-script for Oracle, and won't be done automatically by the JDO-lib.

          I've modified the attached patch to remove the proposed package.jdo change, and keep just the Oracle schema-sql script.

          Thanks a bunch.

          Show
          Mithun Radhakrishnan added a comment - @Carl: Thanks. I'd seen that thread before, but I hadn't actually tested that out myself. I just did, and here's what I found out: 1. If the complete schema is created apriori (using the hive-schema-0.9.0.oracle.sql in the attached patch), then hive works with Oracle. (This is with "datanucleus.validateColumns = false".) 2. What's neat is that "datanucleus.autoCreateSchema = true" doesn't mess this up, because the schema is completely constructed. The exception in the Original Description was a result of there being schema differences that JDO attempted to resolve. (For the record, Datanucleus does recommend turning both flags off, for the sake of performance. They're meant to be used for the first start-up.) The offshoot of this would be that any changes in the schema would have to be resolved using a migration-script for Oracle, and won't be done automatically by the JDO-lib. I've modified the attached patch to remove the proposed package.jdo change, and keep just the Oracle schema-sql script. Thanks a bunch.
          Hide
          Carl Steinbach added a comment -

          @Mithun: That's great news! Thanks for experimenting with this.

          Is the patch ready for review? If so can you please click the "submit patch" button? Thanks.

          Show
          Carl Steinbach added a comment - @Mithun: That's great news! Thanks for experimenting with this. Is the patch ready for review? If so can you please click the "submit patch" button? Thanks.
          Hide
          Mithun Radhakrishnan added a comment -

          Hey, Carl. I have submitted the patch. I'd appreciate it if you'd look over the Oracle-schema file. (It's pretty straightforward.)

          (Thanks very much for looking at this. Much appreciated.)

          Show
          Mithun Radhakrishnan added a comment - Hey, Carl. I have submitted the patch. I'd appreciate it if you'd look over the Oracle-schema file. (It's pretty straightforward.) (Thanks very much for looking at this. Much appreciated.)
          Hide
          Ashutosh Chauhan added a comment -

          Unlinking from 0.9

          Show
          Ashutosh Chauhan added a comment - Unlinking from 0.9
          Hide
          Andrew Bayer added a comment -

          Fixed path of file.

          Show
          Andrew Bayer added a comment - Fixed path of file.
          Hide
          Carl Steinbach added a comment -

          Committed to trunk. Thanks Mithun and Andrew!

          Show
          Carl Steinbach added a comment - Committed to trunk. Thanks Mithun and Andrew!
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1391 (See https://builds.apache.org/job/Hive-trunk-h0.21/1391/)
          HIVE-2928. Support for Oracle-backed Hive-Metastore (longvarchar to clob in package.jdo) (Mithun Radhakrishnan and Andrew Bayer via cws) (Revision 1329416)

          Result = FAILURE
          cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1329416
          Files :

          • /hive/trunk/metastore/scripts/upgrade/oracle
          • /hive/trunk/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1391 (See https://builds.apache.org/job/Hive-trunk-h0.21/1391/ ) HIVE-2928 . Support for Oracle-backed Hive-Metastore (longvarchar to clob in package.jdo) (Mithun Radhakrishnan and Andrew Bayer via cws) (Revision 1329416) Result = FAILURE cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1329416 Files : /hive/trunk/metastore/scripts/upgrade/oracle /hive/trunk/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql
          Hide
          Carl Steinbach added a comment -

          Backported to 0.9.1

          Show
          Carl Steinbach added a comment - Backported to 0.9.1
          Hide
          Hudson added a comment -

          Integrated in Hive-0.9.1-SNAPSHOT-h0.21 #40 (See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/40/)
          HIVE-2928. Support for Oracle-backed Hive-Metastore (longvarchar to clob in package.jdo) (Revision 1347397)

          Result = FAILURE
          cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1347397
          Files :

          • /hive/branches/branch-0.9/metastore/scripts/upgrade/oracle
          • /hive/branches/branch-0.9/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql
          Show
          Hudson added a comment - Integrated in Hive-0.9.1-SNAPSHOT-h0.21 #40 (See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/40/ ) HIVE-2928 . Support for Oracle-backed Hive-Metastore (longvarchar to clob in package.jdo) (Revision 1347397) Result = FAILURE cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1347397 Files : /hive/branches/branch-0.9/metastore/scripts/upgrade/oracle /hive/branches/branch-0.9/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql
          Hide
          Hudson added a comment -

          Integrated in Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #40 (See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/40/)
          HIVE-2928. Support for Oracle-backed Hive-Metastore (longvarchar to clob in package.jdo) (Revision 1347397)

          Result = FAILURE
          cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1347397
          Files :

          • /hive/branches/branch-0.9/metastore/scripts/upgrade/oracle
          • /hive/branches/branch-0.9/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql
          Show
          Hudson added a comment - Integrated in Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #40 (See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/40/ ) HIVE-2928 . Support for Oracle-backed Hive-Metastore (longvarchar to clob in package.jdo) (Revision 1347397) Result = FAILURE cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1347397 Files : /hive/branches/branch-0.9/metastore/scripts/upgrade/oracle /hive/branches/branch-0.9/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-2928. Support for Oracle-backed Hive-Metastore (longvarchar to clob in package.jdo) (Mithun Radhakrishnan and Andrew Bayer via cws) (Revision 1329416)

          Result = ABORTED
          cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1329416
          Files :

          • /hive/trunk/metastore/scripts/upgrade/oracle
          • /hive/trunk/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-2928 . Support for Oracle-backed Hive-Metastore (longvarchar to clob in package.jdo) (Mithun Radhakrishnan and Andrew Bayer via cws) (Revision 1329416) Result = ABORTED cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1329416 Files : /hive/trunk/metastore/scripts/upgrade/oracle /hive/trunk/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql
          Hide
          Ashutosh Chauhan added a comment -

          This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          Show
          Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

            People

            • Assignee:
              Mithun Radhakrishnan
              Reporter:
              Mithun Radhakrishnan
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development