Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12274

Increase width of columns used for general configuration in the metastore.

    XMLWordPrintableJSON

Details

    • Increased width of metastore text columns for general configuration storage.

    Description

      Overview

      This issue is very similar in principle to HIVE-1364. We are hitting a limit when processing JSON data that has a large nested schema. The struct definition is truncated when inserted into the metastore database column COLUMNS_V2.TYPE_NAME as it is greater than 4000 characters in length.

      Given that the purpose of these columns is to hold very loosely defined configuration values it seems rather limiting to impose such a relatively low length bound. One can imagine that valid use cases will arise where reasonable parameter/property values exceed the current limit.

      Context

      These limitations were in by the patch attributed to HIVE-1364 which mentions the "max length on Oracle 9i/10g/11g" as the reason. However, nowadays the limit can be increased because:

      • Oracle DB's varchar2 supports 32767 bytes now, by setting the configuration parameter MAX_STRING_SIZE to EXTENDED. (source)
      • Postgres supports a max of 1GB for character datatype. (source)
      • MySQL can support upto 65535 bytes for the entire row. So long as the PARAM_KEY value + PARAM_VALUE is less than 65535, we should be good. (source)
      • SQL Server's varchar max length is 8000 and can go beyond using "varchar(max)" with the same limitation as MySQL being 65535 bytes for the entire row. (source)
      • Derby's varchar can be upto 32672 bytes. (source)

      Proposal

      Can these columns not use CLOB-like types as for example as used by TBLS.VIEW_EXPANDED_TEXT? It would seem that suitable type equivalents exist for all targeted database platforms:

      • MySQL: mediumtext
      • Postgres: text
      • Oracle: CLOB
      • Derby: LONG VARCHAR

      I'd suggest that the candidates for type change are:

      • COLUMNS_V2.TYPE_NAME
      • TABLE_PARAMS.PARAM_VALUE
      • SERDE_PARAMS.PARAM_VALUE
      • SD_PARAMS.PARAM_VALUE

      After updating the maximum length the metastore database needs to be configured and restarted with the new settings. Altering MAX_STRING_SIZE will update database objects and possibly invalidate them, as follows:

      • Tables with virtual columns will be updated with new data type metadata for virtual columns of VARCHAR2(4000), 4000-byte NVARCHAR2, or RAW(2000) type.
      • Functional indexes will become unusable if a change to their associated virtual columns causes the index key to exceed index key length limits. Attempts to rebuild such indexes will fail with ORA-01450: maximum key length exceeded.
      • Views will be invalidated if they contain VARCHAR2(4000), 4000-byte NVARCHAR2, or RAW(2000) typed expression columns.
      • Materialized views will be updated with new metadata VARCHAR2(4000), 4000-byte NVARCHAR2, and RAW(2000) typed expression columns
      • So the limitation could be raised to 32672 bytes, with the caveat that MySQL and SQL Server limit the row length to 65535 bytes, so that should also be validated to provide consistency.

      Finally, will this limitation persist in the work resulting from HIVE-9452?

      Attachments

        1. HIVE-12274.2.patch
          53 kB
          Naveen Gangam
        2. HIVE-12274.3.patch
          57 kB
          Naveen Gangam
        3. HIVE-12274.4.patch
          56 kB
          Naveen Gangam
        4. HIVE-12274.5.patch
          60 kB
          Naveen Gangam
        5. HIVE-12274.example.ddl.hql
          23 kB
          Elliot West
        6. HIVE-12274.patch
          51 kB
          Naveen Gangam

        Issue Links

          Activity

            People

              ngangam Naveen Gangam
              teabot Elliot West
              Votes:
              10 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m