Description
Here is the original text of external table on HDFS:
Permission Owner Group Size Last Modified Replication Block Size Name -rw-r--r-- root supergroup 0 B 8/6/2017, 11:43:03 PM 3 256 MB income_band_001.dat -rw-r--r-- root supergroup 0 B 8/6/2017, 11:39:31 PM 3 256 MB income_band_002.dat ... -rw-r--r-- root supergroup 327 B 8/6/2017, 11:44:47 PM 3 256 MB income_band_530.dat
After SparkSQL load, every files have a output file, even the files are 0B. For the load on Hive, the data files would be merged according the data size of original files.
Reproduce:
CREATE EXTERNAL TABLE t1 (a int,b string) STORED AS TEXTFILE LOCATION "hdfs://xxx:9000/data/t1" CREATE TABLE t2 STORED AS PARQUET AS SELECT * FROM t1;
The table t2 have many small files without data.
Attachments
Issue Links
- links to