1. The TIMESTAMP Type for Hive 1.1. TIMESTAMP Hive will retrieve and display TIMESTAMP values in 'YYYY-MM-DD HH:MM:SS' format. The TIMESTAMP data type has a range of '1970-01-01 00:00:01' UTC to '2038-01-09 03:14:07' UTC. One can specify TIMESTAMP values using any of a common set of formats: As a string in either 'YYYY-MM-DD HH:MM:SS' or 'YY-MM-DD HH:MM:SS' format. A "relaxed" syntax is allowed: Any punctuation character may be used as the delimiter between date parts or time parts. For example, '98-12-31 11:30:45', '98.12.31 11+30+45', '98/12/31 11*30*45', and '98@12@31 11^30^45' are equivalent. Date part of string in either 'YYYY-MM-DD' or 'YY-MM-DD' format. A "relaxed" syntax is allowed here, too. For example, '98-12-31', '98.12.31', '98/12/31', and '98@12@31' are equivalent. As a string with no delimiters in either 'YYYYMMDDHHMMSS' or 'YYMMDDHHMMSS' format, provided that the string makes sense as a date is allowed. For example, '20070523091528' and '070523091528' are interpreted as '2007-05-23 09:15:28', but '071122129015' is illegal (it has a nonsensical minute part) and becomes '0000-00-00 00:00:00'. As a string with no delimiters in either 'YYYYMMDD' or 'YYMMDD' format, provided that the string makes sense as a date is allowed. For example, '20070523' and '070523' are interpreted as '2007-05-23', but '071332' is illegal (it has nonsensical month and day parts) and becomes '0000-00-00'. A number in either YYYYMMDDHHMMSS or YYMMDDHHMMSS format, provided that the number makes sense as a date is allowed. For example, 19830905132800 and 830905132800 are interpreted as '1983-09-05 13:28:00'. A number in either YYYYMMDD or YYMMDD format, provided that the number makes sense as a date is allowed. For example, 19830905 and 830905 are interpreted as '1983-09-05'. As the result of a function that returns a value that is acceptable in a TIMESTAMP context, such as NOW(). HIVE> SELECT NOW(); +---------------------+ | NOW() | +---------------------+ | 2007-11-30 10:41:47 | +---------------------+ Illegal TIMESTAMP values are converted to the "zero" value of the appropriate type ('0000-00-00 00:00:00' or '0000-00-00'). For values specified as strings that include date part delimiters, it is not necessary to specify two digits for month or day values that are less than 10. '1979-6-9' is the same as '1979-06-09'. Similarly, for values specified as strings that include time part delimiters, it is not necessary to specify two digits for hour, minute, or second values that are less than 10. '1979-10-30 1:2:3' is the same as '1979-10-30 01:02:03'. Values specified as numbers should be 6, 8, 12, or 14 digits long. If a number is 8 or 14 digits long, it is assumed to be in YYYYMMDD or YYYYMMDDHHMMSS format and that the year is given by the first 4 digits. If the number is 6 or 12 digits long, it is assumed to be in YYMMDD or YYMMDDHHMMSS format and that the year is given by the first 2 digits. Numbers that are not one of these lengths are interpreted as though padded with leading zeros to the closest length. Values specified as non-delimited strings are interpreted using their length as given. If the string is 8 or 14 characters long, the year is assumed to be given by the first 4 characters. Otherwise, the year is assumed to be given by the first 2 characters. The string is interpreted from left to right to find year, month, day, hour, minute, and second values, for as many parts as are present in the string. This means you should not use strings that have fewer than 6 characters. For example, if you specify '9903', thinking that represents March, 1999, Hive inserts a "zero" date value into your table. This occurs because the year and month values are 99 and 03, but the day part is completely missing, so the value is not a legal date. However, you can explicitly specify a value of zero to represent missing month or day parts. For example, you can use '990300' to insert the value '1999-03-00'. TIMESTAMP values cannot be earlier than 1970 UTC or later than '2038-01-09 03:14:07' UTC. This means that a date such as '1968-01-01', while legal as a DATETIME or DATE value, is not valid as a TIMESTAMP value and is converted to 0. Be aware of certain problems when specifying date values: The relaxed format allowed for values specified as strings can be deceiving. For example, a value such as '10:11:12' might look like a time value because of the ":" delimiter, but if used in a date context is interpreted as the year '2010-11-12'. The value '10:45:15' is converted to '0000-00-00' because '45' is not a legal month. The server requires that month and day values be legal, and not merely in the range 1 to 12 and 1 to 31, respectively. With strict mode disabled, invalid dates such as '2004-04-31' are converted to '0000-00-00' and a warning is generated. Hive does not accept timestamp values that include a zero in the day or month column or values that are not a valid date. The sole exception to this rule is the special value '0000-00-00 00:00:00'. 1.2 TIMESTAMP Implementation The display width is fixed at 19 characters, and the format is 'YYYY-MM-DD HH:MM:SS'. TIMESTAMP values are converted from the current time zone to UTC for storage, and converted back from UTC to the current time zone for retrieval. By default, the current time zone for each connection is the server's time. As long as the time zone setting remains constant, you get back the same value you store. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions. The current time zone is available as the value of the time_zone system variable. The TIMESTAMP data type offers automatic initialization and updating. You can choose whether to use these properties and which column should have them: For one TIMESTAMP column in a table, you can assign the current timestamp as the default value and the auto-update value. It is possible to have the current timestamp be the default value for initializing the column, for the auto-update value, or both. It is not possible to have the current timestamp be the default value for one column and the auto-update value for another column. Any single TIMESTAMP column in a table can be used as the one that is initialized to the current date and time, or updated automatically. This need not be the first TIMESTAMP column. If a DEFAULT value is specified for the first TIMESTAMP column in a table, it is not ignored. The default can be CURRENT_TIMESTAMP or a constant date and time value. In a CREATE TABLE statement, the first TIMESTAMP column can be declared in any of the following ways: With both DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP clauses, the column has the current timestamp for its default value, and is automatically updated. With neither DEFAULT nor ON UPDATE clauses, it is the same as DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP. With a DEFAULT CURRENT_TIMESTAMP clause and no ON UPDATE clause, the column has the current timestamp for its default value but is not automatically updated. With no DEFAULT clause and with an ON UPDATE CURRENT_TIMESTAMP clause, the column has a default of 0 and is automatically updated. With a constant DEFAULT value, the column has the given default and is not automatically initialized to the current timestamp. If the column also has an ON UPDATE CURRENT_TIMESTAMP clause, it is automatically updated; otherwise, it has a constant default and is not automatically updated. In other words, you can use the current timestamp for both the initial value and the auto-update value, or either one, or neither. ========================= More specifications from MySQL 6.0 to be considered later ===================== (For example, you can specify ON UPDATE to enable auto-update without also having the column auto-initialized.) The following column definitions demonstrate each of the possibilities: Auto-initialization and auto-update: ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP Auto-initialization only: ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP Auto-update only: ts TIMESTAMP DEFAULT 0 ON UPDATE CURRENT_TIMESTAMP Neither: ts TIMESTAMP DEFAULT 0 To specify automatic default or updating for a TIMESTAMP column other than the first one, you must suppress the automatic initialization and update behaviors for the first TIMESTAMP column by explicitly assigning it a constant DEFAULT value (for example, DEFAULT 0 or DEFAULT '2003-01-01 00:00:00'). Then, for the other TIMESTAMP column, the rules are the same as for the first TIMESTAMP column, except that if you omit both of the DEFAULT and ON UPDATE clauses, no automatic initialization or updating occurs. Example: CREATE TABLE t ( ts1 TIMESTAMP DEFAULT 0, ts2 TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP); CURRENT_TIMESTAMP or any of its synonyms (CURRENT_TIMESTAMP(), NOW(), LOCALTIME, LOCALTIME(), LOCALTIMESTAMP, or LOCALTIMESTAMP()) can be used in the DEFAULT and ON UPDATE clauses. They all mean "the current timestamp." (UTC_TIMESTAMP is not allowed. Its range of values does not align with those of the TIMESTAMP column anyway unless the current time zone is UTC.) The order of the DEFAULT and ON UPDATE attributes does not matter. If both DEFAULT and ON UPDATE are specified for a TIMESTAMP column, either can precede the other. For example, these statements are equivalent: CREATE TABLE t (ts TIMESTAMP); CREATE TABLE t (ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP); CREATE TABLE t (ts TIMESTAMP ON UPDATE CURRENT_TIMESTAMP DEFAULT CURRENT_TIMESTAMP); [The examples that use DEFAULT 0 will not work if the NO_ZERO_DATE SQL mode is enabled because that mode causes "zero" date values (specified as 0, '0000-00-00, or '0000-00-00 00:00:00') to be rejected. Be aware that the TRADITIONAL SQL mode includes NO_ZERO_DATE. TIMESTAMP columns are NOT NULL by default, cannot contain NULL values, and assigning NULL assigns the current timestamp. However, a TIMESTAMP column can be allowed to contain NULL by declaring it with the NULL attribute. In this case, the default value also becomes NULL unless overridden with a DEFAULT clause that specifies a different default value. DEFAULT NULL can be used to explicitly specify NULL as the default value. (For a TIMESTAMP column not declared with the NULL attribute, DEFAULT NULL is illegal.) If a TIMESTAMP column allows NULL values, assigning NULL sets it to NULL, not to the current timestamp.] The following table contains several TIMESTAMP columns that allow NULL values: CREATE TABLE t ( ts1 TIMESTAMP NULL DEFAULT NULL, ts2 TIMESTAMP NULL DEFAULT 0, ts3 TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP ); Note that a TIMESTAMP column that allows NULL values will not take on the current timestamp except under one of the following conditions: Its default value is defined as CURRENT_TIMESTAMP NOW() or CURRENT_TIMESTAMP is inserted into the column. In other words, a TIMESTAMP column defined as NULL will auto-initialize only if it is created using a definition such as the following: CREATE TABLE t (ts TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP); If the TIMESTAMP column is defined to allow NULL values but not using DEFAULT CURRENT_TIMESTAMP, as shown here: CREATE TABLE t1 (ts TIMESTAMP NULL DEFAULT NULL); CREATE TABLE t2 (ts TIMESTAMP NULL DEFAULT '0000-00-00 00:00:00'); then you must explicitly insert a value corresponding to the current date and time. For example: INSERT INTO t1 VALUES (NOW()); INSERT INTO t2 VALUES (CURRENT_TIMESTAMP); ===================================================================================================