Hive
  1. Hive
  2. HIVE-1466

Add NULL DEFINED AS to ROW FORMAT specification

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: SQL
    • Labels:
      None
    • Release Note:
      This features enables defining a custom null format for a table via 'create table' statement. A custom null format can also be specified while exporting data to local filesystem using 'insert overwrite .. local dir' statement.

      Description

      NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as "NULL". This is inconsistent.

      The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving.

      1. HIVE-1466.1.patch
        28 kB
        Prasad Mujumdar
      2. HIVE-1466.2.patch
        38 kB
        Prasad Mujumdar

        Issue Links

          Activity

          Adam Kramer created issue -
          Adam Kramer made changes -
          Field Original Value New Value
          Description I just updated the Hive wiki to clarify what some would consider an oddity: When NULL values are exported to a script via TRANSFORM, they are converted to the string "\N", and then when the script's output is read, any cell that contains only \N is treated as a NULL value.

          I believe that there are very VERY few reasons why anyone would need cells that contain only a backslash and then a capital N to be distinguished from NULL cells, but for complete generality, we should allow this.

          The way to do that is probably by adding a specification in the ROW FORMAT for a table that would allow any string to be treated as a NULL if it is the only string in a cell. Some may prefer the empty string, others the word NULL in caps, etc. I vote for keeping \N as the default because I am used to it, but also for allowing this to be customized.
          NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as "NULL". This is inconsistent.

          The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving.
          Prasad Mujumdar made changes -
          Assignee Prasad Mujumdar [ prasadm ]
          Prasad Mujumdar made changes -
          Attachment HIVE-1466.1.patch [ 12618376 ]
          Prasad Mujumdar made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Prasad Mujumdar made changes -
          Remote Link This issue links to "ReviewBoard #16207 (Web Link)" [ 13516 ]
          Prasad Mujumdar made changes -
          Attachment HIVE-1466.2.patch [ 12618535 ]
          Prasad Mujumdar made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Thejas M Nair made changes -
          Issue Type Improvement [ 4 ] New Feature [ 2 ]
          Prasad Mujumdar made changes -
          Release Note This features enables defining a custom null format for a table via 'create table' statement. A custom null format can also be specified while exporting data to local filesystem using 'insert overwrite .. local dir' statement.
          Fix Version/s 0.13.0 [ 12324986 ]
          Component/s SQL [ 12315100 ]
          Swarnim Kulkarni made changes -
          Labels TODOC13
          Lefty Leverenz made changes -
          Labels TODOC13

            People

            • Assignee:
              Prasad Mujumdar
              Reporter:
              Adam Kramer
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development