Hive
  1. Hive
  2. HIVE-192

DUPLICATE: Add TIMESTAMP column type for thrift dynamic_type

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Query Processor
    • Labels:

      Description

      create table something2 (test timestamp);
      ERROR: DDL specifying type timestamp which has not been defined
      java.lang.RuntimeException: specifying type timestamp which has not been defined
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101)
      at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97)
      at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180)
      at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141)
      at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202)
      at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641)
      at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305)

      1. TIMESTAMP_specification.txt
        12 kB
        Shyam Sundar Sarkar
      2. create_2.q.txt
        0.6 kB
        Shyam Sundar Sarkar
      3. Hive-192.patch.txt
        2 kB
        Shyam Sundar Sarkar

        Issue Links

          Activity

          Johan Oskarsson created issue -
          Hide
          Ashish Thusoo added a comment -

          yup. we actually have not added support for timestamp yet...

          Show
          Ashish Thusoo added a comment - yup. we actually have not added support for timestamp yet...
          Shyam Sundar Sarkar made changes -
          Field Original Value New Value
          Attachment Hive_diff [ 12401746 ]
          Hide
          Shyam Sundar Sarkar added a comment -

          This is the design document for Hive TIMESTAMP implementation. We are going to implement most of the specifications from MySQL 6.0.

          Show
          Shyam Sundar Sarkar added a comment - This is the design document for Hive TIMESTAMP implementation. We are going to implement most of the specifications from MySQL 6.0.
          Shyam Sundar Sarkar made changes -
          Attachment TIMESTAMP_specification.txt [ 12401830 ]
          Shyam Sundar Sarkar made changes -
          Attachment Hive_diff [ 12401746 ]
          Hide
          Namit Jain added a comment -

          For the first cut, we dont need so much flexibility. Let us only support one default format.

          We can have a to_timestamp() function which takes in the format, that function can be enhanced to support more formats on demand.

          Show
          Namit Jain added a comment - For the first cut, we dont need so much flexibility. Let us only support one default format. We can have a to_timestamp() function which takes in the format, that function can be enhanced to support more formats on demand.
          Hide
          Shyam Sundar Sarkar added a comment -

          Ok. After lot of inspection over the MySQL 6.0 spec and Hive grammar, I decided to implement TIMESTAMP as
          a primitive type which will be stored and retrived as a string of characters. Also we have to implement one
          UDF to retrieve TIMESTAMP in certain formats from stored data. We have to implement eventually something
          called TimeZone inside Hive server for storage and retrieval of UTC time. Right now we can use system time.

          Show
          Shyam Sundar Sarkar added a comment - Ok. After lot of inspection over the MySQL 6.0 spec and Hive grammar, I decided to implement TIMESTAMP as a primitive type which will be stored and retrived as a string of characters. Also we have to implement one UDF to retrieve TIMESTAMP in certain formats from stored data. We have to implement eventually something called TimeZone inside Hive server for storage and retrieval of UTC time. Right now we can use system time.
          Hide
          Ashish Thusoo added a comment -

          I went through the write up and it looks quite comprehensive from the DDL point of view. Here are a few comments:

          1. I think we can drop ON UPDATE all together as we do not support UPDATE in Hive at this point.
          2. auto increment is a nice to have feature since it can be achieved with by providing a now() UDF in hive, it is definitely a good feature to have though.
          3. Would be great if you could also add the UDFs that could be added to timestamps. Mysql has a bunch of these. Some that directly come to mind are day(ts), month(ts), year(ts) etc, and to_timestamp function as namit pointed out.

          As far as storage is concerned, that would depend on the serde in question...

          Show
          Ashish Thusoo added a comment - I went through the write up and it looks quite comprehensive from the DDL point of view. Here are a few comments: 1. I think we can drop ON UPDATE all together as we do not support UPDATE in Hive at this point. 2. auto increment is a nice to have feature since it can be achieved with by providing a now() UDF in hive, it is definitely a good feature to have though. 3. Would be great if you could also add the UDFs that could be added to timestamps. Mysql has a bunch of these. Some that directly come to mind are day(ts), month(ts), year(ts) etc, and to_timestamp function as namit pointed out. As far as storage is concerned, that would depend on the serde in question...
          Hide
          Ashish Thusoo added a comment -

          Downgrading this to a critical as this is more of a feature and can go into the release after 0.3.0.

          Show
          Ashish Thusoo added a comment - Downgrading this to a critical as this is more of a feature and can go into the release after 0.3.0.
          Ashish Thusoo made changes -
          Priority Blocker [ 1 ] Critical [ 2 ]
          Hide
          Shyam Sundar Sarkar added a comment -

          I do not have permission to work on this issue. Can someone explain why I have no permission on worklog? I am currently working on it.
          Also what is the timeline for 0.3.0 release in terms of weeks or months?

          Show
          Shyam Sundar Sarkar added a comment - I do not have permission to work on this issue. Can someone explain why I have no permission on worklog? I am currently working on it. Also what is the timeline for 0.3.0 release in terms of weeks or months?
          Shyam Sundar Sarkar made changes -
          Attachment create_2.q.txt [ 12404080 ]
          Hide
          Shyam Sundar Sarkar added a comment -

          Test Driver for TIMESTAMP.

          Show
          Shyam Sundar Sarkar added a comment - Test Driver for TIMESTAMP.
          Shyam Sundar Sarkar made changes -
          Attachment TestCliTimestampDriver.java.txt [ 12404081 ]
          Hide
          Neil Conway added a comment -

          I'd suggest also looking at the SQL spec for the timestamp type (and/or the implementation in PostgreSQL), assuming SQL compliance is something you guys care about.

          Why are you storing timestamps as strings, rather than as a numeric offset from some epoch time?

          Show
          Neil Conway added a comment - I'd suggest also looking at the SQL spec for the timestamp type (and/or the implementation in PostgreSQL), assuming SQL compliance is something you guys care about. Why are you storing timestamps as strings, rather than as a numeric offset from some epoch time?
          Johan Oskarsson made changes -
          Fix Version/s 0.4.0 [ 12313714 ]
          Fix Version/s 0.2.0 [ 12313565 ]
          Priority Critical [ 2 ] Major [ 3 ]
          Shyam Sundar Sarkar made changes -
          Attachment TestCliTimestampDriver.java.txt [ 12404081 ]
          Shyam Sundar Sarkar made changes -
          Comment [ This is the diff file for showing the changes in the Hive.g grammar with new TimestampType added.

          Thanks,
          shyam_sarkar@yahoo.com ]
          Shyam Sundar Sarkar made changes -
          Comment [ Functional test for Timestamp. ]
          Shyam Sundar Sarkar made changes -
          Comment [ I added functional test cases for TIMESTAMP. Can someone suggest more test cases?

          The Java code for test driver is attached ::

          (/hive/build/ql/test/src/org/apache/hadoop/hive/cli/TestCliTimestampDriver.java)

          Can someone please tell me how do I get results and logs for the following call ::

          qt = new QTestUtil("/home/ssarkar/hive/ql/src/test/results/clientpositive", "/home/ssarkar/hive/build/ql/test/logs/clientpositive");

          I am getting Exception.

          At this point can I add any arbitrary results and log files?

          Thanks,
          shyam_sarkar@yahoo.com

          ]
          Shyam Sundar Sarkar made changes -
          Comment [ Can someone please help me to find out why I am getting exception in setUp() method inside TestCliTimestampDriver.java file (attached) ?
          I followed all lines and methods from existing CliDriver test class in Hive and modified just to test TIMESTAMP sysntax in some queries.
          I stepped through the setUp under debug mode and it gave error in QTestUtil at the line :

          private String tmpdir = System.getProperty("user.dir")+"/../build/ql/tmp";

          where "user.dir" was home dir of hive (not inside build dir).

          If I run the general CliDriver tests and then try to run my test for TIMESTAMP, above exception does not show up.
          However, I am getting exception at the line :

          testFiles = conf.get("test.data.files").replace('\\', '/').replace("c:", "");

          inside QTestUtil constructor.

          My question :: Why am I getting setUp() exception when I do not need a data file ?
          Can someone suggest a specific step that I am missing ?

          Thanks,
          shyam_sarkar@yahoo.com ]
          Hide
          Shyam Sundar Sarkar added a comment -

          I was following Hive Developer Guide and found that one important section is missing.
          Section on "3.4. Adding new unit tests" has no instructions about how to add a new unit test.
          I had to go through trial and error methods (with velocity templates) to add a new unit test
          in the test suite.

          I request that someone from original test suite designer team should write few words
          for this imporatnt subsection.

          Regards,
          shyam_sarkar@yahoo.com

          Show
          Shyam Sundar Sarkar added a comment - I was following Hive Developer Guide and found that one important section is missing. Section on "3.4. Adding new unit tests" has no instructions about how to add a new unit test. I had to go through trial and error methods (with velocity templates) to add a new unit test in the test suite. I request that someone from original test suite designer team should write few words for this imporatnt subsection. Regards, shyam_sarkar@yahoo.com
          Hide
          Shyam Sundar Sarkar added a comment -

          Hello,

          I was debugging Hive SQL. I found that Hive talks to Thrift software layer. From the documentation it is clear that there is no
          TIMESTAMP type in thrift type system.

          I am curious if we need to add TIMESTAMP type to Thrift as a basic type. TIMESTAMP type is there in almost all software systems.
          If we do not add TIMESTAMP type to Thrift, should we map it to long Integer or a struct type?

          Any suggestions?

          Thankls,
          shyam_sarkar@yahoo.com

          Show
          Shyam Sundar Sarkar added a comment - Hello, I was debugging Hive SQL. I found that Hive talks to Thrift software layer. From the documentation it is clear that there is no TIMESTAMP type in thrift type system. I am curious if we need to add TIMESTAMP type to Thrift as a basic type. TIMESTAMP type is there in almost all software systems. If we do not add TIMESTAMP type to Thrift, should we map it to long Integer or a struct type? Any suggestions? Thankls, shyam_sarkar@yahoo.com
          Hide
          Ashish Thusoo added a comment -

          Hi Shyam,

          1. About unit tests, you should not have to play with velocity at all to add a new test. You can just add a xyz.q file with the query in ql/test/queries/clientpositive directory and in ql/test/queries/clientnegative directory. And you can capture the test results by

          ant -lib testlibs clean-test test -Dtestcase=TestCliDriver -Dqfile=xyz.q -Doverwrite=true
          for positive tests and

          ant -lib testlibs clean-test test -Dtestcase=TestNegativeCliDriver -Dqfile=xyzneg.q -Doverwrite=true

          and then run these tests again after removing the -Doverwrite part and after checking the results in the associated .q.out file formed in ql/test/results... directory. Will add these instructions to the Developer Guide.

          2. Native timestamp support in thrift would help a lot to be able to support timestamps with thrift base serdes. Note that with zheng's work on lazy serdes we would remove the dependency on thrift for serializing and deserializing data. So at this point we should just say that TIMESTAMPS are not supported with thrift serde. On the use of thrift on as a client server transport, there should not be any issue as any data passing that boundary is also serialized/deserialized using serdes.

          Show
          Ashish Thusoo added a comment - Hi Shyam, 1. About unit tests, you should not have to play with velocity at all to add a new test. You can just add a xyz.q file with the query in ql/test/queries/clientpositive directory and in ql/test/queries/clientnegative directory. And you can capture the test results by ant -lib testlibs clean-test test -Dtestcase=TestCliDriver -Dqfile=xyz.q -Doverwrite=true for positive tests and ant -lib testlibs clean-test test -Dtestcase=TestNegativeCliDriver -Dqfile=xyzneg.q -Doverwrite=true and then run these tests again after removing the -Doverwrite part and after checking the results in the associated .q.out file formed in ql/test/results... directory. Will add these instructions to the Developer Guide. 2. Native timestamp support in thrift would help a lot to be able to support timestamps with thrift base serdes. Note that with zheng's work on lazy serdes we would remove the dependency on thrift for serializing and deserializing data. So at this point we should just say that TIMESTAMPS are not supported with thrift serde. On the use of thrift on as a client server transport, there should not be any issue as any data passing that boundary is also serialized/deserialized using serdes.
          Hide
          Ashish Thusoo added a comment -

          Added Shyam as a contributor and assigned.

          Show
          Ashish Thusoo added a comment - Added Shyam as a contributor and assigned.
          Ashish Thusoo made changes -
          Assignee Shyam Sundar Sarkar [ ssarkar ]
          Hide
          Shyam Sundar Sarkar added a comment - - edited

          Dear Ashish and others,

          After inspecting many different options for implementation, following implementation seems to have the right direction::

          (1) TIMESTAMP values cannot be earlier than 1970 UTC or later than '2038-01-09 03:14:07' UTC. This means that
          a date such as '1968-01-01', while legal as a DATETIME or DATE value, is not valid as a TIMESTAMP value
          and is converted to 0.

          (2) The display width is fixed at 19 characters, and the format is 'YYYY-MM-DD HH:MM:SS'. TIMESTAMP values
          are converted from the current time zone to UTC for storage, and converted back from UTC to the current
          time zone for retrieval. By default, the current time zone for each connection is the server's time.

          (3) The implementation involves conversion of TIMESTAMP string into UTC which is a long integer number
          representing number of seconds from 1970 UTC before storing through SERDE2. Similarly, a long integer
          type representing number of seconds from 1970 UTC will be converted in TIMESTAMP string type after
          retrieval through SERDE2.

          (4) In this implementation SERDE2 or thrift software layer will not be touched at all. The metadata store will
          maintain the information about TIMESTAMP type only. Actual storage will keep long integer values.

          Please comment on this observation and suggest any corrections!

          Thanks,
          Shyam_sarkar@yahoo.com

          Show
          Shyam Sundar Sarkar added a comment - - edited Dear Ashish and others, After inspecting many different options for implementation, following implementation seems to have the right direction:: (1) TIMESTAMP values cannot be earlier than 1970 UTC or later than '2038-01-09 03:14:07' UTC. This means that a date such as '1968-01-01', while legal as a DATETIME or DATE value, is not valid as a TIMESTAMP value and is converted to 0. (2) The display width is fixed at 19 characters, and the format is 'YYYY-MM-DD HH:MM:SS'. TIMESTAMP values are converted from the current time zone to UTC for storage, and converted back from UTC to the current time zone for retrieval. By default, the current time zone for each connection is the server's time. (3) The implementation involves conversion of TIMESTAMP string into UTC which is a long integer number representing number of seconds from 1970 UTC before storing through SERDE2. Similarly, a long integer type representing number of seconds from 1970 UTC will be converted in TIMESTAMP string type after retrieval through SERDE2. (4) In this implementation SERDE2 or thrift software layer will not be touched at all. The metadata store will maintain the information about TIMESTAMP type only. Actual storage will keep long integer values. Please comment on this observation and suggest any corrections! Thanks, Shyam_sarkar@yahoo.com
          Hide
          Ashish Thusoo added a comment -

          I agree with this route. The string conversions to and from timestamp can all be done in the query layers itself. So I am +1 on this.

          Show
          Ashish Thusoo added a comment - I agree with this route. The string conversions to and from timestamp can all be done in the query layers itself. So I am +1 on this.
          Johan Oskarsson made changes -
          Issue Type Bug [ 1 ] New Feature [ 2 ]
          Fix Version/s 0.5.0 [ 12314156 ]
          Fix Version/s 0.4.0 [ 12313714 ]
          Namit Jain made changes -
          Fix Version/s 0.5.0 [ 12314156 ]
          Zheng Shao made changes -
          Affects Version/s 0.6.0 [ 12314524 ]
          Affects Version/s 0.2.0 [ 12313565 ]
          Carl Steinbach made changes -
          Issue Type New Feature [ 2 ] Bug [ 1 ]
          Fix Version/s 0.3.0 [ 12313637 ]
          Affects Version/s 0.6.0 [ 12314524 ]
          Carl Steinbach made changes -
          Fix Version/s 0.3.0 [ 12313637 ]
          Carl Steinbach made changes -
          Summary Cannot create table with timestamp type column Add TIMESTAMP column type
          Issue Type Bug [ 1 ] New Feature [ 2 ]
          Hide
          Shyam Sundar Sarkar added a comment -

          Create table syntax is parsed and executed fine with Timestamp data type where Timestamp is internally converted into string type.

          Show
          Shyam Sundar Sarkar added a comment - Create table syntax is parsed and executed fine with Timestamp data type where Timestamp is internally converted into string type.
          Hide
          Shyam Sundar Sarkar added a comment -

          I converted a TIMESTAMP type into string type as part of createTable method in DDLTask.java class. Can anyone suggest how do I retain TIMESTAMP type in the Hive metadata repository and send string type only to db.createTable() method so that SerDe takes it as string type only ? I want this for back and forth conversions during SELECT, UPDATE etc. It seems that metadata is permanently modified to string type in Hive.

          Any suggestions will help.

          -S. Sarkar

          Show
          Shyam Sundar Sarkar added a comment - I converted a TIMESTAMP type into string type as part of createTable method in DDLTask.java class. Can anyone suggest how do I retain TIMESTAMP type in the Hive metadata repository and send string type only to db.createTable() method so that SerDe takes it as string type only ? I want this for back and forth conversions during SELECT, UPDATE etc. It seems that metadata is permanently modified to string type in Hive. Any suggestions will help. -S. Sarkar
          Shyam Sundar Sarkar made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          Shyam Sundar Sarkar added a comment -

          This is just the initial changes for others to look at and suggest. I need suggestions about string to Timestamp conversion within Dynamic SerDe layer.

          Show
          Shyam Sundar Sarkar added a comment - This is just the initial changes for others to look at and suggest. I need suggestions about string to Timestamp conversion within Dynamic SerDe layer.
          Shyam Sundar Sarkar made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Hide
          Shyam Sundar Sarkar added a comment -

          This is just the changes for thrift layer to see only string types being passed back and forth for Timestamp type.

          Show
          Shyam Sundar Sarkar added a comment - This is just the changes for thrift layer to see only string types being passed back and forth for Timestamp type.
          Shyam Sundar Sarkar made changes -
          Attachment Hive-192.patch.txt [ 12452459 ]
          Carl Steinbach made changes -
          Link This issue is required by HIVE-1386 [ HIVE-1386 ]
          Hide
          John Sichi added a comment -

          This patch is not ready for commit.

          Show
          John Sichi added a comment - This patch is not ready for commit.
          John Sichi made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Carl Steinbach made changes -
          Labels SQL
          Franklin Hu made changes -
          Summary Add TIMESTAMP column type Add TIMESTAMP column type for thrift dynamic_type
          Franklin Hu made changes -
          Link This issue relates to HIVE-1269 [ HIVE-1269 ]
          Hide
          Franklin Hu added a comment -

          related to HIVE-1269

          Show
          Franklin Hu added a comment - related to HIVE-1269
          Hide
          Ashutosh Chauhan added a comment -

          HIVE-2272 added TIMESTAMP as primitive type.

          Show
          Ashutosh Chauhan added a comment - HIVE-2272 added TIMESTAMP as primitive type.
          Ashutosh Chauhan made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 0.8.0 [ 12316178 ]
          Resolution Fixed [ 1 ]
          Carl Steinbach made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Carl Steinbach added a comment -

          This is a duplicate of HIVE-2272. The work contained in this ticket was never committed, and it should have been resolved as a duplicate.

          Show
          Carl Steinbach added a comment - This is a duplicate of HIVE-2272 . The work contained in this ticket was never committed, and it should have been resolved as a duplicate.
          Carl Steinbach made changes -
          Summary Add TIMESTAMP column type for thrift dynamic_type DUPLICATE: Add TIMESTAMP column type for thrift dynamic_type

            People

            • Assignee:
              Shyam Sundar Sarkar
              Reporter:
              Johan Oskarsson
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development