[TAJO-806] CreateTableNode in CTAS uses a wrong schema as output schema and table schema. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9.0, 0.8.1
Component/s: Planner/Optimizer, Storage
Labels:
None

Description

In below case, currently, TajoWriteSupport just takes the schema of the table orders. In other words, each column qualifier was default.orders instead of default.parquet_test. This is a bug. In such a case, we can meet the following error when we read parquet files.

default> create table parquet_test using parquet as select * from orders;
Progress: 0%, response time: 1.119 sec
Progress: 0%, response time: 2.121 sec
Progress: 0%, response time: 3.123 sec
Progress: 83%, response time: 4.126 sec
Progress: 100%, response time: 4.709 sec
(1500000 rows, 4.709 sec, 109.9 MiB inserted)

default> select * from parquet_test;
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" java.lang.NullPointerException
	at parquet.hadoop.InternalParquetRecordReader.close(InternalParquetRecordReader.java:118)
	at parquet.hadoop.ParquetReader.close(ParquetReader.java:144)
	at org.apache.tajo.storage.parquet.ParquetScanner.close(ParquetScanner.java:87)
	at org.apache.tajo.storage.MergeScanner.close(MergeScanner.java:137)
	at org.apache.tajo.jdbc.TajoResultSet.close(TajoResultSet.java:153)
	at org.apache.tajo.cli.TajoCli.localQueryCompleted(TajoCli.java:387)
	at org.apache.tajo.cli.TajoCli.executeQuery(TajoCli.java:365)
	at org.apache.tajo.cli.TajoCli.executeParsedResults(TajoCli.java:322)
	at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:311)
	at org.apache.tajo.cli.TajoCli.main(TajoCli.java:490)
Apr 30, 2014 11:04:01 AM INFO: parquet.hadoop.ParquetFileReader: reading another 1 footers

The patch fixes the bug where CreateTableNode takes the wrong schema.

In addition, I found the potential problem where ParquetFile stores the Tajo Schema into its extra meta data. I think that it will problem when users renames its database name or table name. So, I removed the code to insert a Tajo schema into extra metadata and I changed Parquet reading to not use extra metadata.

Tajo mainly uses Catalog system to manage schemas, and reading parquet files in Tajo depends on Tajo catalog. So, it will work well. Also, other systems can access parquet files by directly reading parquet's native schema.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TAJO-806.patch
30/Apr/14 03:55
28 kB
Hyunsik Choi

Activity

People

Assignee:: Hyunsik Choi

Reporter:: Hyunsik Choi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Apr/14 03:49

Updated:: 30/Apr/14 16:22

Resolved:: 30/Apr/14 09:37