Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5766

Add date/timestamp format options to external tables

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • impala 2.2

    Description

      Could we support more than one date separator while automatically parsing string timestamp columns. If possible, could we also enhance to specify a common timestamp format for a table?
      In detail:
      ------------
      External table:
      CREATE EXTERNAL TABLE videowatchactivity(
      id int,
      user string,
      activitydttm timestamp,
      videoid string,
      activity string)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      LOCATION '/user/videologger/activity';

      If HDFS file: "/user/videologger/activity/activity_aug_4_2017.csv" contains -->
      Venu,2017-08-04 11:23:00,video_id_0,start
      Venu,2017-08-04 11:25:00,video_id_0,stop
      Then,
      "select activitydttm from videowatchactivity;" outputs 2 valid rows.

      If HDFS file: /user/videologger/activity/activity_aug_4_2017.csv contains -->
      Venu,2017/08/04 11:23:00,video_id_0,start
      Venu,2017/08/04 11:25:00,video_id_0,stop
      Then,
      "select activitydttm from videowatchactivity;" outputs 2 NULL rows.

      Could there be a provision to specify the date separator in the create table statement that could be used in the logic to automatically parse strings to timestamp columns?
      E.g.,
      CREATE EXTERNAL TABLE videowatchactivity(
      id int,
      user string,
      activitydttm timestamp,
      videoid string,
      activity string)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      LOCATION '/user/videologger/activity'
      DATESEPARATOR '/';
      Please note the new "DATESEPARATOR '/'" that could help us define any separator for the date field.

      If possible to extend, we could also use a new DATEFMT property of the table to specify a format like==> MM-dd-YYYY HH:mm:ss instead of the default format too.

      Thanks,

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            venu.yanamandra@live.com Venu Yanamandra

            Dates

              Created:
              Updated:

              Slack

                Issue deployment