Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
In below case, currently, TajoWriteSupport just takes the schema of the table orders. In other words, each column qualifier was default.orders instead of default.parquet_test. This is a bug. In such a case, we can meet the following error when we read parquet files.
default> create table parquet_test using parquet as select * from orders; Progress: 0%, response time: 1.119 sec Progress: 0%, response time: 2.121 sec Progress: 0%, response time: 3.123 sec Progress: 83%, response time: 4.126 sec Progress: 100%, response time: 4.709 sec (1500000 rows, 4.709 sec, 109.9 MiB inserted) default> select * from parquet_test; SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Exception in thread "main" java.lang.NullPointerException at parquet.hadoop.InternalParquetRecordReader.close(InternalParquetRecordReader.java:118) at parquet.hadoop.ParquetReader.close(ParquetReader.java:144) at org.apache.tajo.storage.parquet.ParquetScanner.close(ParquetScanner.java:87) at org.apache.tajo.storage.MergeScanner.close(MergeScanner.java:137) at org.apache.tajo.jdbc.TajoResultSet.close(TajoResultSet.java:153) at org.apache.tajo.cli.TajoCli.localQueryCompleted(TajoCli.java:387) at org.apache.tajo.cli.TajoCli.executeQuery(TajoCli.java:365) at org.apache.tajo.cli.TajoCli.executeParsedResults(TajoCli.java:322) at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:311) at org.apache.tajo.cli.TajoCli.main(TajoCli.java:490) Apr 30, 2014 11:04:01 AM INFO: parquet.hadoop.ParquetFileReader: reading another 1 footers
The patch fixes the bug where CreateTableNode takes the wrong schema.
In addition, I found the potential problem where ParquetFile stores the Tajo Schema into its extra meta data. I think that it will problem when users renames its database name or table name. So, I removed the code to insert a Tajo schema into extra metadata and I changed Parquet reading to not use extra metadata.
Tajo mainly uses Catalog system to manage schemas, and reading parquet files in Tajo depends on Tajo catalog. So, it will work well. Also, other systems can access parquet files by directly reading parquet's native schema.