Hive
  1. Hive
  2. HIVE-1466

Add NULL DEFINED AS to ROW FORMAT specification

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: SQL
    • Labels:
      None
    • Release Note:
      This features enables defining a custom null format for a table via 'create table' statement. A custom null format can also be specified while exporting data to local filesystem using 'insert overwrite .. local dir' statement.

      Description

      NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as "NULL". This is inconsistent.

      The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving.

      1. HIVE-1466.2.patch
        38 kB
        Prasad Mujumdar
      2. HIVE-1466.1.patch
        28 kB
        Prasad Mujumdar

        Issue Links

          Activity

          Adam Kramer created issue -
          Adam Kramer made changes -
          Field Original Value New Value
          Description I just updated the Hive wiki to clarify what some would consider an oddity: When NULL values are exported to a script via TRANSFORM, they are converted to the string "\N", and then when the script's output is read, any cell that contains only \N is treated as a NULL value.

          I believe that there are very VERY few reasons why anyone would need cells that contain only a backslash and then a capital N to be distinguished from NULL cells, but for complete generality, we should allow this.

          The way to do that is probably by adding a specification in the ROW FORMAT for a table that would allow any string to be treated as a NULL if it is the only string in a cell. Some may prefer the empty string, others the word NULL in caps, etc. I vote for keeping \N as the default because I am used to it, but also for allowing this to be customized.
          NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as "NULL". This is inconsistent.

          The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving.
          Prasad Mujumdar made changes -
          Assignee Prasad Mujumdar [ prasadm ]
          Prasad Mujumdar made changes -
          Attachment HIVE-1466.1.patch [ 12618376 ]
          Prasad Mujumdar made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Prasad Mujumdar made changes -
          Remote Link This issue links to "ReviewBoard #16207 (Web Link)" [ 13516 ]
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12618376/HIVE-1466.1.patch

          SUCCESS: +1 4765 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/622/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/622/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12618376

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618376/HIVE-1466.1.patch SUCCESS: +1 4765 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/622/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/622/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12618376
          Hide
          Xuefu Zhang added a comment -

          Patch looks good. Minor common on RB.

          Show
          Xuefu Zhang added a comment - Patch looks good. Minor common on RB.
          Hide
          Prasad Mujumdar added a comment -

          Addressed review comments, added more test cases

          Show
          Prasad Mujumdar added a comment - Addressed review comments, added more test cases
          Prasad Mujumdar made changes -
          Attachment HIVE-1466.2.patch [ 12618535 ]
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12618535/HIVE-1466.2.patch

          SUCCESS: +1 4788 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/627/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/627/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12618535

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618535/HIVE-1466.2.patch SUCCESS: +1 4788 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/627/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/627/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12618535
          Hide
          Xuefu Zhang added a comment -

          +1

          Show
          Xuefu Zhang added a comment - +1
          Hide
          Prasad Mujumdar added a comment -

          Patch committed to trunk.

          Show
          Prasad Mujumdar added a comment - Patch committed to trunk.
          Prasad Mujumdar made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Thejas M Nair made changes -
          Issue Type Improvement [ 4 ] New Feature [ 2 ]
          Hide
          Thejas M Nair added a comment -

          Prasad, can you please add a release note to the jira and create a followup jira for inclusion in wiki (Lefty or someone else might be able to help with incorporation in wiki) or update the wiki page itself directly ?
          We should try to ensure that all new features get documented. I think the best way to do that is to ensure that documentation is available before feature is committed.

          Show
          Thejas M Nair added a comment - Prasad, can you please add a release note to the jira and create a followup jira for inclusion in wiki (Lefty or someone else might be able to help with incorporation in wiki) or update the wiki page itself directly ? We should try to ensure that all new features get documented. I think the best way to do that is to ensure that documentation is available before feature is committed.
          Hide
          Prasad Mujumdar added a comment -

          Thejas M Nair Thanks for pointing that out. I just updated the wiki to reflect the syntax changes. will add a rel note on the ticket.

          I guess it's a good idea to have a doc jira along with the a patch that introduces a user facing change (SQL syntax, script/tools etc). The reviewers should also verify that before approving the patch.

          Show
          Prasad Mujumdar added a comment - Thejas M Nair Thanks for pointing that out. I just updated the wiki to reflect the syntax changes. will add a rel note on the ticket. I guess it's a good idea to have a doc jira along with the a patch that introduces a user facing change (SQL syntax, script/tools etc). The reviewers should also verify that before approving the patch.
          Prasad Mujumdar made changes -
          Release Note This features enables defining a custom null format for a table via 'create table' statement. A custom null format can also be specified while exporting data to local filesystem using 'insert overwrite .. local dir' statement.
          Fix Version/s 0.13.0 [ 12324986 ]
          Component/s SQL [ 12315100 ]
          Swarnim Kulkarni made changes -
          Labels TODOC13
          Show
          Lefty Leverenz added a comment - Prasad Mujumdar documented this in the DDL and DML wikidocs: DDL: Create Table (row_format) DDL: Row Format, Storage Format, and SerDe DDL doc diffs for HIVE-1466 DML: Writing data into the filesystem from queries DML doc diffs for HIVE-1466
          Lefty Leverenz made changes -
          Labels TODOC13
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          1246d 5h 15m 1 Prasad Mujumdar 12/Dec/13 07:46
          Patch Available Patch Available Resolved Resolved
          4d 12h 6m 1 Prasad Mujumdar 16/Dec/13 19:52

            People

            • Assignee:
              Prasad Mujumdar
              Reporter:
              Adam Kramer
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development