Hive
  1. Hive
  2. HIVE-5304

Hive results can depend on metastore's underlying datastore

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Metastore
    • Labels:
      None

      Description

      [removed old description]
      Hive JDOQL filter pushdown and direct SQL may end up pushing StringCol op 'SomeString' to underlying SQL datastore. However, the datastore may handle these differently based on the encoding and collation used for the columns of the database.
      So, query results can change depending on the underlying store for the metastore and its version.

      drop_partitions_filter.q test illustrates this problem. In byte order collation (proper way) USA is sorted before Uganda, but some collations may do it the other way, causing the test to fail.

      I am assuming that byte-order sort if the correct way to order things.
      Our MySQL script specifies _bin collation, which is byte-order; Postgres 9.1 and after, as far as I see, defaults to "C" collation, which is also byte-order.
      Derby seems to use byte-order by default, I didn't spend a lot of time on Derby.
      However, Postgres before 9.1 seems to default to "en_US.UTF8" and there's no way to change column collation in our script if database is already created.
      MySQL by default doesn't use _bin collation (on my machine), so if database is auto-created, the order of things is going to change.
      I didn't investigate MSSQL or Oracle.

      For now it seems that:
      1) Auto-create shouldn't be used.
      2) If old version of postgres (<9.1) is used, the collation should be set properly by whoever issues "create database" (that is not our script).
      3) We might want to add 'collate "C"' to varchar columns in the postgres script to ensure the correct collation; however, this will break the script for postgres <9.1.
      4) MSSQL and Oracle might warrant investigation.

        Activity

        Hide
        Sergey Shelukhin added a comment -

        Ashutosh Chauhan do you think we should document this for people who create databases?

        Show
        Sergey Shelukhin added a comment - Ashutosh Chauhan do you think we should document this for people who create databases?
        Hide
        Sergey Shelukhin added a comment -

        Ashutosh Chauhan I updated the JIRA description with detailed investigation results... just fyi

        Show
        Sergey Shelukhin added a comment - Ashutosh Chauhan I updated the JIRA description with detailed investigation results... just fyi
        Hide
        Sergey Shelukhin added a comment -

        Actually, all names - I see show indexes order also changes in some queries, etc.

        Show
        Sergey Shelukhin added a comment - Actually, all names - I see show indexes order also changes in some queries, etc.

          People

          • Assignee:
            Unassigned
            Reporter:
            Sergey Shelukhin
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development