Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7730

Improve ORC File Format Timezone issues

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 3.0
    • None
    • Backend
    • None

    Description

      As pointed out in https://gerrit.cloudera.org/#/c/11731 by csringhofer, our support for the ORC file format doesn't follow the same timezone conventions as the rest of Impala.

      tldr: ORC's timezone handling is likely to be broken in Impala so we should patch it in the toolchain

      The ORC library implements its own IANA timezone handling to convert stored timestamps from UTC to local time + do something similar for min/max stats. The writer's timezone can be also stored in .orc files and used instead of local timezone.

      Impala's and ORC library's timezone can be different because of several reasons:

      ORC's timezone is not overridden by env var TZ and query option timezone
      ORC uses a simpler way to detect the local timezone which may not work on some Linux distros (see TimezoneDatabase::LocalZoneName in Impala vs LOCAL_TIMEZONE in Orc)
      .orc files can use any time zone as writer's timezone and we cannot be sure that it will exist on the reader machine
      My suggestion is to patch the ORC library in the toolchain and remove timezone handling (e.g. by always using UTC, maybe depending on a flag), as the way it is currently working is likely to be broken and is surely not consistent with the rest of Impala.

      I am not sure how timezones could be handled correctly in Orc + Impala. If someone plans to work on it, I would gladly help in the integration to Impala.

      Attachments

        1. orc.zip
          47 kB
          Philip Martin

        Activity

          People

            csringhofer Csaba Ringhofer
            philip Philip Martin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: