Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25466

Trim spaces from db and table names

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Parser, SQL
    • None

    Description

      If we create databases and tables with leading/trailing whitespaces (using backticks) the behavior is inconsistent and leads to multiple problems.

      Creating database with spaces makes it part of the database name, from then on they must be used with backticks.

      0: jdbc:hive2://hs2> create database `mydb1 `;
      INFO  : OK
      0: jdbc:hive2://hs2> desc database `mydb1 `;
      Location: "/warehouse/tablespace/external/hive/mydb1 .db"
      

      With leading spaces the database can be created, but it can't be referenced anymore:

      0: jdbc:hive2://hs2> create database ` mydb2`;
      INFO  : OK
      0: jdbc:hive2://hs2> desc database ` mydb2`;
      Error: Error while compiling statement: FAILED: SemanticException [Error 10072]: Database does not exist:  mydb2 (state=42000,code=10072)
      
      0: jdbc:hive2://hs2> !outputformat xmlattr
      0: jdbc:hive2://hs2> show databases;
      <resultset>
        <result database_name=" mydb2"/>
        <result database_name="default"/>
        <result database_name="information_schema"/>
        <result database_name="mydb1 "/>
        <result database_name="sys"/>
      </resultset>
      

      For tables the spaces are trimmed - the tables are created without leading or trailing spaces in their names. However as the below example shows the space is kept within the table location.

      0: jdbc:hive2://hs2> create external table `mytbl1 ` (col1 string);
      INFO  : OK
      0: jdbc:hive2://hs2> desc formatted `mytbl1 `;
      Location: "hdfs://namenode:8020/warehouse/tablespace/external/hive/mytbl1 "
      
      0: jdbc:hive2://hs2> create external table ` mytbl2` (col1 string);
      INFO  : OK
      0: jdbc:hive2://hs2> desc formatted ` mytbl2`;
      Location: "hdfs://namenode:8020/warehouse/tablespace/external/hive/ mytbl2"
      
      0: jdbc:hive2://hs2> show tables;
      <resultset>
        <result tab_name="mytbl1"/>
        <result tab_name="mytbl2"/>
      </resultset>
      

      Interestingly during table creation or other operations like "use" the database name's is trimmed.

      0: jdbc:hive2://hs2> create database mydb3;
      INFO  : OK
      0: jdbc:hive2://hs2> create table ` mydb3`.`mytbl3` (col1 string);
      INFO  : OK
      0: jdbc:hive2://hs2> use `  mydb3  `;
      INFO  : OK
      0: jdbc:hive2://hs2> show tables;
      <resultset>
        <result tab_name="mytbl3"/>
      </resultset>
      

      One can validate with hdfs commands that the locations have even the trailing spaces. Keeping the space in the HDFS location is inconsistent with the table names and also confusing in multiple ways (like you cannot see the trailing space), and sounds a very bad pattern.

      This is even more problematic because in the underlying HMS database the NOTIFICATION_LOG entries are created with the spaces as it has been passed in the SQL statement even if the table name is trimmed - which is providing incorrect information to the other components relying on the NOTIFICATION_LOG.

      Hive should trim completely the database and table names in the SQL statements - without propagating that forward.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mszurap Miklos Szurap
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: