Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Impala 2.12.0
-
None
-
ghx-label-7
Description
The wording on https://impala.apache.org/docs/build/html/topics/impala_alter_table.html makes it seem like you can change the delimiter of text tables after they are created.
I did the following to simulate a table that needed to switch between comma and pipe delimited:
hadoop fs -mkdir /user/alanj
hadoop fs -mkdir /user/alanj/test_delim
echo "A,B|C" > delim.txt
hadoop fs -put delim.txt /user/alanj/test_delim
Then created in impala and tried to change delimiters:
> create external table default.alanj_test_delim(A string, B string) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," LOCATION '/user/alanj/test_delim'; > select * from default.alanj_test_delim; Query: select * from default.alanj_test_delim +---+-----+ | a | b | +---+-----+ | A | B|C | +---+-----+ > alter table default.alanj_test_delim set SERDEPROPERTIES ('serialization.format'='|', 'field.delim'='|'); > select * from default.alanj_test_delim; +---+-----+ | a | b | +---+-----+ | A | B|C | +---+-----+ > show create table default.alanj_test_delim; +----------------------------------------------------------------------------------------------------------------------+ | result | +----------------------------------------------------------------------------------------------------------------------+ | CREATE EXTERNAL TABLE default.alanj_test_delim ( | | a STRING, | | b STRING | | ) | | ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' | | WITH SERDEPROPERTIES ('field.delim'='|', 'serialization.format'='|') | | STORED AS TEXTFILE | | LOCATION 'hdfs://namenode:8020/user/alanj/test_delim' | | TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'numFiles'='0', 'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='0') | +----------------------------------------------------------------------------------------------------------------------+
So it shows the right serdeproperties, but impala doesn't actually use them to read the data.
If you then insert data (as the docs suggest), it writes that data with the new delimiter:
> insert into default.alanj_test_delim values('D', 'E,F'); > select * from alanj_test_delim; +-----+-----+ | a | b | +-----+-----+ | A,B | C | | D | E,F | +-----+-----+ # hadoop fs -cat /user/alanj/test_delim/a54bb0ec14646492-a738811400000000_1498283208_data.0. D|E,F