Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-28316

The documentation provides an ambiguous explanation regarding the mutually exclusive nature of `STORED BY` and `STORED AS`

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Not Applicable
    • Documentation
    • None

    Description

      • The documentation provides an ambiguous explanation regarding the mutually exclusive nature of STORED BY and STORED AS.
      • As mentioned on https://cwiki.apache.org/confluence/display/Hive/StorageHandlers , when the CREATE TABLE statement specifies STORED BY, it should not also specify STORED AS. The content in question is as follows.
        When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED AS cannot be specified. Optional SERDEPROPERTIES can be specified as part of the STORED BY clause and will be passed to the serde provided by the storage handler.
        
        See CREATE TABLE and Row Format, Storage Format, and SerDe for more information.
        
        Example:
        
        CREATE TABLE hbase_table_1(key int, value string)
        STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
        WITH SERDEPROPERTIES (
        "hbase.columns.mapping" = "cf:string",
        "hbase.table.name" = "hbase_table_0"
        );
        
      • This is similarly reflected in the documentation at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL , where | separates STORED BY from STORED AS, indicating their distinct usage and mutual exclusivity.
        [
           [ROW FORMAT row_format] 
           [STORED AS file_format]
             | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
        ]
        
      • However, this contradicts the information provided in the Hive-Iceberg Integration documentation at https://cwiki.apache.org/confluence/display/Hive/Hive-Iceberg+Integration , which explicitly gives examples demonstrating that STORED BY can coexist with STORED AS. This creates an ambiguous interpretation.
        The iceberg table currently supports three file formats: PARQUET, ORC & AVRO. The default file format is Parquet. The file format can be explicitily provided by using STORED AS <Format> while creating the table
        
        Example-1:
        
        CREATE TABLE ORC_TABLE (ID INT) STORED BY ICEBERG STORED AS ORC;
        
      • Further early discussions on this topic can be found at https://github.com/apache/shardingsphere/pull/31526 .

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dengzh Zhihua Deng Assign to me
            linghengqian Qiheng He
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment