XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Backend, Frontend
    • None
    • ghx-label-8

    Description

      An initial testable implementation of BINARY would contain the following:

      • DDL support for BINARY, e.g. create table
      • read support from text file (stored with base64 encoding)
      • basic client support (hs2, beeswax)
      • cast from/to STRING
      • basic operators (=,<,>), all should work the same way as for STRING
      • length() built-in function (the Hive wiki doesn't mention it but it works)

      Optional in the first step:

      • write support for text file
      • joins on BINARY columns
      • aggregates on BINARY columns
      • COMPUTE STATS

      Hive also allows binary columns for partitioning, but it seems buggy (HIVE-12680) and I would prefer to avoid it in Impala.

      The last time a new type (DATE) was added in Impala was a massive change:
      https://gerrit.cloudera.org/#/c/12481/

      I hope that BINARY will be much simpler, as:

      • It should be handled by the backend exactly the same way as STRING, which can mean that the backend work will be minimal (only the file readers/writers have to differentiate between them). This is different in Hive, where STRING is treated UTF-8, and binary is not.
      • The frontend should also treat it similarly to STRING, just with much less capabilities, e.g. no casts to other types than STRING and it shouldn't be accepted by UDFs that expect STRING.
      • As BINARY supports very few features, tests also need to cover much less cases.

      Attachments

        Activity

          People

            Unassigned Unassigned
            csringhofer Csaba Ringhofer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: