[HIVE-6131] New columns after table alter result in null values despite data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Critical
Resolution: Unresolved
Affects Version/s: 0.11.0, 0.12.0, 0.13.0, 1.2.1
Fix Version/s: None
Component/s: None
Labels:
None

Description

Hi folks,

I found and verified a bug on our CDH 4.0.3 install of Hive when adding columns to tables with Partitions using 'REPLACE COLUMNS'. I dug through the Jira a little bit and didn't see anything for it so hopefully this isn't just noise on the radar.

Basically, when you alter a table with partitions and then reupload data to that partition, it doesn't seem to recognize the extra data that actually exists in HDFS- as in, returns NULL values on the new column despite having the data and recognizing the new column in the metadata.

Here's some steps to reproduce using a basic table:

1. Run this hive command: CREATE TABLE jvaughan_test (col1 string) partitioned by (day string);
2. Create a simple file on the system with a couple of entries, something like "hi" and "hi2" separated by newlines.
3. Run this hive command, pointing it at the file: LOAD DATA LOCAL INPATH '<FILEDIR>' OVERWRITE INTO TABLE jvaughan_test PARTITION (day = '2014-01-02');
4. Confirm the data with: SELECT * FROM jvaughan_test WHERE day = '2014-01-02';
5. Alter the column definitions: ALTER TABLE jvaughan_test REPLACE COLUMNS (col1 string, col2 string);
6. Edit your file and add a second column using the default separator (ctrl+v, then ctrl+a in Vim) and add two more entries, such as "hi3" on the first row and "hi4" on the second
7. Run step 3 again
8. Check the data again like in step 4

For me, this is the results that get returned:

hive> select * from jvaughan_test where day = '2014-01-01';
OK
hi NULL 2014-01-02
hi2 NULL 2014-01-02

This is despite the fact that there is data in the file stored by the partition in HDFS.

Let me know if you need any other information. The only workaround for me currently is to drop partitions for any I'm replacing data in and THEN reupload the new data file.

Thanks,

-James

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-6131.1.patch
31/Mar/14 21:44
8 kB
Pala M Muthaia

Issue Links

relates to

IMPALA-4894 Impala changes table schema when you do ALTER TABLE .. ADD COLUMNS but not partition schema resulting in false NULL values in Hive.

Open

HIVE-6835 Reading of partitioned Avro data fails if partition schema does not match table schema

Closed

Activity

People

Assignee:: Unassigned

Reporter:: James Vaughan

Votes:: 1 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 03/Jan/14 01:10

Updated:: 24/Aug/17 02:13