Hive
  1. Hive
  2. HIVE-192

DUPLICATE: Add TIMESTAMP column type for thrift dynamic_type

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Query Processor
    • Labels:

      Description

      create table something2 (test timestamp);
      ERROR: DDL specifying type timestamp which has not been defined
      java.lang.RuntimeException: specifying type timestamp which has not been defined
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439)
      at org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101)
      at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97)
      at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180)
      at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141)
      at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202)
      at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641)
      at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305)

      1. TIMESTAMP_specification.txt
        12 kB
        Shyam Sundar Sarkar
      2. create_2.q.txt
        0.6 kB
        Shyam Sundar Sarkar
      3. Hive-192.patch.txt
        2 kB
        Shyam Sundar Sarkar

        Issue Links

          Activity

          Hide
          Carl Steinbach added a comment -

          This is a duplicate of HIVE-2272. The work contained in this ticket was never committed, and it should have been resolved as a duplicate.

          Show
          Carl Steinbach added a comment - This is a duplicate of HIVE-2272 . The work contained in this ticket was never committed, and it should have been resolved as a duplicate.
          Hide
          Ashutosh Chauhan added a comment -

          HIVE-2272 added TIMESTAMP as primitive type.

          Show
          Ashutosh Chauhan added a comment - HIVE-2272 added TIMESTAMP as primitive type.
          Hide
          Franklin Hu added a comment -

          related to HIVE-1269

          Show
          Franklin Hu added a comment - related to HIVE-1269
          Hide
          John Sichi added a comment -

          This patch is not ready for commit.

          Show
          John Sichi added a comment - This patch is not ready for commit.
          Hide
          Shyam Sundar Sarkar added a comment -

          This is just the changes for thrift layer to see only string types being passed back and forth for Timestamp type.

          Show
          Shyam Sundar Sarkar added a comment - This is just the changes for thrift layer to see only string types being passed back and forth for Timestamp type.
          Hide
          Shyam Sundar Sarkar added a comment -

          This is just the initial changes for others to look at and suggest. I need suggestions about string to Timestamp conversion within Dynamic SerDe layer.

          Show
          Shyam Sundar Sarkar added a comment - This is just the initial changes for others to look at and suggest. I need suggestions about string to Timestamp conversion within Dynamic SerDe layer.
          Hide
          Shyam Sundar Sarkar added a comment -

          I converted a TIMESTAMP type into string type as part of createTable method in DDLTask.java class. Can anyone suggest how do I retain TIMESTAMP type in the Hive metadata repository and send string type only to db.createTable() method so that SerDe takes it as string type only ? I want this for back and forth conversions during SELECT, UPDATE etc. It seems that metadata is permanently modified to string type in Hive.

          Any suggestions will help.

          -S. Sarkar

          Show
          Shyam Sundar Sarkar added a comment - I converted a TIMESTAMP type into string type as part of createTable method in DDLTask.java class. Can anyone suggest how do I retain TIMESTAMP type in the Hive metadata repository and send string type only to db.createTable() method so that SerDe takes it as string type only ? I want this for back and forth conversions during SELECT, UPDATE etc. It seems that metadata is permanently modified to string type in Hive. Any suggestions will help. -S. Sarkar
          Hide
          Shyam Sundar Sarkar added a comment -

          Create table syntax is parsed and executed fine with Timestamp data type where Timestamp is internally converted into string type.

          Show
          Shyam Sundar Sarkar added a comment - Create table syntax is parsed and executed fine with Timestamp data type where Timestamp is internally converted into string type.
          Hide
          Ashish Thusoo added a comment -

          I agree with this route. The string conversions to and from timestamp can all be done in the query layers itself. So I am +1 on this.

          Show
          Ashish Thusoo added a comment - I agree with this route. The string conversions to and from timestamp can all be done in the query layers itself. So I am +1 on this.
          Hide
          Shyam Sundar Sarkar added a comment - - edited

          Dear Ashish and others,

          After inspecting many different options for implementation, following implementation seems to have the right direction::

          (1) TIMESTAMP values cannot be earlier than 1970 UTC or later than '2038-01-09 03:14:07' UTC. This means that
          a date such as '1968-01-01', while legal as a DATETIME or DATE value, is not valid as a TIMESTAMP value
          and is converted to 0.

          (2) The display width is fixed at 19 characters, and the format is 'YYYY-MM-DD HH:MM:SS'. TIMESTAMP values
          are converted from the current time zone to UTC for storage, and converted back from UTC to the current
          time zone for retrieval. By default, the current time zone for each connection is the server's time.

          (3) The implementation involves conversion of TIMESTAMP string into UTC which is a long integer number
          representing number of seconds from 1970 UTC before storing through SERDE2. Similarly, a long integer
          type representing number of seconds from 1970 UTC will be converted in TIMESTAMP string type after
          retrieval through SERDE2.

          (4) In this implementation SERDE2 or thrift software layer will not be touched at all. The metadata store will
          maintain the information about TIMESTAMP type only. Actual storage will keep long integer values.

          Please comment on this observation and suggest any corrections!

          Thanks,
          Shyam_sarkar@yahoo.com

          Show
          Shyam Sundar Sarkar added a comment - - edited Dear Ashish and others, After inspecting many different options for implementation, following implementation seems to have the right direction:: (1) TIMESTAMP values cannot be earlier than 1970 UTC or later than '2038-01-09 03:14:07' UTC. This means that a date such as '1968-01-01', while legal as a DATETIME or DATE value, is not valid as a TIMESTAMP value and is converted to 0. (2) The display width is fixed at 19 characters, and the format is 'YYYY-MM-DD HH:MM:SS'. TIMESTAMP values are converted from the current time zone to UTC for storage, and converted back from UTC to the current time zone for retrieval. By default, the current time zone for each connection is the server's time. (3) The implementation involves conversion of TIMESTAMP string into UTC which is a long integer number representing number of seconds from 1970 UTC before storing through SERDE2. Similarly, a long integer type representing number of seconds from 1970 UTC will be converted in TIMESTAMP string type after retrieval through SERDE2. (4) In this implementation SERDE2 or thrift software layer will not be touched at all. The metadata store will maintain the information about TIMESTAMP type only. Actual storage will keep long integer values. Please comment on this observation and suggest any corrections! Thanks, Shyam_sarkar@yahoo.com
          Hide
          Ashish Thusoo added a comment -

          Added Shyam as a contributor and assigned.

          Show
          Ashish Thusoo added a comment - Added Shyam as a contributor and assigned.
          Hide
          Ashish Thusoo added a comment -

          Hi Shyam,

          1. About unit tests, you should not have to play with velocity at all to add a new test. You can just add a xyz.q file with the query in ql/test/queries/clientpositive directory and in ql/test/queries/clientnegative directory. And you can capture the test results by

          ant -lib testlibs clean-test test -Dtestcase=TestCliDriver -Dqfile=xyz.q -Doverwrite=true
          for positive tests and

          ant -lib testlibs clean-test test -Dtestcase=TestNegativeCliDriver -Dqfile=xyzneg.q -Doverwrite=true

          and then run these tests again after removing the -Doverwrite part and after checking the results in the associated .q.out file formed in ql/test/results... directory. Will add these instructions to the Developer Guide.

          2. Native timestamp support in thrift would help a lot to be able to support timestamps with thrift base serdes. Note that with zheng's work on lazy serdes we would remove the dependency on thrift for serializing and deserializing data. So at this point we should just say that TIMESTAMPS are not supported with thrift serde. On the use of thrift on as a client server transport, there should not be any issue as any data passing that boundary is also serialized/deserialized using serdes.

          Show
          Ashish Thusoo added a comment - Hi Shyam, 1. About unit tests, you should not have to play with velocity at all to add a new test. You can just add a xyz.q file with the query in ql/test/queries/clientpositive directory and in ql/test/queries/clientnegative directory. And you can capture the test results by ant -lib testlibs clean-test test -Dtestcase=TestCliDriver -Dqfile=xyz.q -Doverwrite=true for positive tests and ant -lib testlibs clean-test test -Dtestcase=TestNegativeCliDriver -Dqfile=xyzneg.q -Doverwrite=true and then run these tests again after removing the -Doverwrite part and after checking the results in the associated .q.out file formed in ql/test/results... directory. Will add these instructions to the Developer Guide. 2. Native timestamp support in thrift would help a lot to be able to support timestamps with thrift base serdes. Note that with zheng's work on lazy serdes we would remove the dependency on thrift for serializing and deserializing data. So at this point we should just say that TIMESTAMPS are not supported with thrift serde. On the use of thrift on as a client server transport, there should not be any issue as any data passing that boundary is also serialized/deserialized using serdes.
          Hide
          Shyam Sundar Sarkar added a comment -

          Hello,

          I was debugging Hive SQL. I found that Hive talks to Thrift software layer. From the documentation it is clear that there is no
          TIMESTAMP type in thrift type system.

          I am curious if we need to add TIMESTAMP type to Thrift as a basic type. TIMESTAMP type is there in almost all software systems.
          If we do not add TIMESTAMP type to Thrift, should we map it to long Integer or a struct type?

          Any suggestions?

          Thankls,
          shyam_sarkar@yahoo.com

          Show
          Shyam Sundar Sarkar added a comment - Hello, I was debugging Hive SQL. I found that Hive talks to Thrift software layer. From the documentation it is clear that there is no TIMESTAMP type in thrift type system. I am curious if we need to add TIMESTAMP type to Thrift as a basic type. TIMESTAMP type is there in almost all software systems. If we do not add TIMESTAMP type to Thrift, should we map it to long Integer or a struct type? Any suggestions? Thankls, shyam_sarkar@yahoo.com
          Hide
          Shyam Sundar Sarkar added a comment -

          I was following Hive Developer Guide and found that one important section is missing.
          Section on "3.4. Adding new unit tests" has no instructions about how to add a new unit test.
          I had to go through trial and error methods (with velocity templates) to add a new unit test
          in the test suite.

          I request that someone from original test suite designer team should write few words
          for this imporatnt subsection.

          Regards,
          shyam_sarkar@yahoo.com

          Show
          Shyam Sundar Sarkar added a comment - I was following Hive Developer Guide and found that one important section is missing. Section on "3.4. Adding new unit tests" has no instructions about how to add a new unit test. I had to go through trial and error methods (with velocity templates) to add a new unit test in the test suite. I request that someone from original test suite designer team should write few words for this imporatnt subsection. Regards, shyam_sarkar@yahoo.com
          Hide
          Neil Conway added a comment -

          I'd suggest also looking at the SQL spec for the timestamp type (and/or the implementation in PostgreSQL), assuming SQL compliance is something you guys care about.

          Why are you storing timestamps as strings, rather than as a numeric offset from some epoch time?

          Show
          Neil Conway added a comment - I'd suggest also looking at the SQL spec for the timestamp type (and/or the implementation in PostgreSQL), assuming SQL compliance is something you guys care about. Why are you storing timestamps as strings, rather than as a numeric offset from some epoch time?
          Hide
          Shyam Sundar Sarkar added a comment -

          Test Driver for TIMESTAMP.

          Show
          Shyam Sundar Sarkar added a comment - Test Driver for TIMESTAMP.
          Hide
          Shyam Sundar Sarkar added a comment -

          I do not have permission to work on this issue. Can someone explain why I have no permission on worklog? I am currently working on it.
          Also what is the timeline for 0.3.0 release in terms of weeks or months?

          Show
          Shyam Sundar Sarkar added a comment - I do not have permission to work on this issue. Can someone explain why I have no permission on worklog? I am currently working on it. Also what is the timeline for 0.3.0 release in terms of weeks or months?
          Hide
          Ashish Thusoo added a comment -

          Downgrading this to a critical as this is more of a feature and can go into the release after 0.3.0.

          Show
          Ashish Thusoo added a comment - Downgrading this to a critical as this is more of a feature and can go into the release after 0.3.0.
          Hide
          Ashish Thusoo added a comment -

          I went through the write up and it looks quite comprehensive from the DDL point of view. Here are a few comments:

          1. I think we can drop ON UPDATE all together as we do not support UPDATE in Hive at this point.
          2. auto increment is a nice to have feature since it can be achieved with by providing a now() UDF in hive, it is definitely a good feature to have though.
          3. Would be great if you could also add the UDFs that could be added to timestamps. Mysql has a bunch of these. Some that directly come to mind are day(ts), month(ts), year(ts) etc, and to_timestamp function as namit pointed out.

          As far as storage is concerned, that would depend on the serde in question...

          Show
          Ashish Thusoo added a comment - I went through the write up and it looks quite comprehensive from the DDL point of view. Here are a few comments: 1. I think we can drop ON UPDATE all together as we do not support UPDATE in Hive at this point. 2. auto increment is a nice to have feature since it can be achieved with by providing a now() UDF in hive, it is definitely a good feature to have though. 3. Would be great if you could also add the UDFs that could be added to timestamps. Mysql has a bunch of these. Some that directly come to mind are day(ts), month(ts), year(ts) etc, and to_timestamp function as namit pointed out. As far as storage is concerned, that would depend on the serde in question...
          Hide
          Shyam Sundar Sarkar added a comment -

          Ok. After lot of inspection over the MySQL 6.0 spec and Hive grammar, I decided to implement TIMESTAMP as
          a primitive type which will be stored and retrived as a string of characters. Also we have to implement one
          UDF to retrieve TIMESTAMP in certain formats from stored data. We have to implement eventually something
          called TimeZone inside Hive server for storage and retrieval of UTC time. Right now we can use system time.

          Show
          Shyam Sundar Sarkar added a comment - Ok. After lot of inspection over the MySQL 6.0 spec and Hive grammar, I decided to implement TIMESTAMP as a primitive type which will be stored and retrived as a string of characters. Also we have to implement one UDF to retrieve TIMESTAMP in certain formats from stored data. We have to implement eventually something called TimeZone inside Hive server for storage and retrieval of UTC time. Right now we can use system time.
          Hide
          Namit Jain added a comment -

          For the first cut, we dont need so much flexibility. Let us only support one default format.

          We can have a to_timestamp() function which takes in the format, that function can be enhanced to support more formats on demand.

          Show
          Namit Jain added a comment - For the first cut, we dont need so much flexibility. Let us only support one default format. We can have a to_timestamp() function which takes in the format, that function can be enhanced to support more formats on demand.
          Hide
          Shyam Sundar Sarkar added a comment -

          This is the design document for Hive TIMESTAMP implementation. We are going to implement most of the specifications from MySQL 6.0.

          Show
          Shyam Sundar Sarkar added a comment - This is the design document for Hive TIMESTAMP implementation. We are going to implement most of the specifications from MySQL 6.0.
          Hide
          Ashish Thusoo added a comment -

          yup. we actually have not added support for timestamp yet...

          Show
          Ashish Thusoo added a comment - yup. we actually have not added support for timestamp yet...

            People

            • Assignee:
              Shyam Sundar Sarkar
              Reporter:
              Johan Oskarsson
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development