Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-806

CreateTableNode in CTAS uses a wrong schema as output schema and table schema.



      In below case, currently, TajoWriteSupport just takes the schema of the table orders. In other words, each column qualifier was default.orders instead of default.parquet_test. This is a bug. In such a case, we can meet the following error when we read parquet files.

      default> create table parquet_test using parquet as select * from orders;
      Progress: 0%, response time: 1.119 sec
      Progress: 0%, response time: 2.121 sec
      Progress: 0%, response time: 3.123 sec
      Progress: 83%, response time: 4.126 sec
      Progress: 100%, response time: 4.709 sec
      (1500000 rows, 4.709 sec, 109.9 MiB inserted)
      default> select * from parquet_test;
      SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
      SLF4J: Defaulting to no-operation (NOP) logger implementation
      SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
      Exception in thread "main" java.lang.NullPointerException
      	at parquet.hadoop.InternalParquetRecordReader.close(InternalParquetRecordReader.java:118)
      	at parquet.hadoop.ParquetReader.close(ParquetReader.java:144)
      	at org.apache.tajo.storage.parquet.ParquetScanner.close(ParquetScanner.java:87)
      	at org.apache.tajo.storage.MergeScanner.close(MergeScanner.java:137)
      	at org.apache.tajo.jdbc.TajoResultSet.close(TajoResultSet.java:153)
      	at org.apache.tajo.cli.TajoCli.localQueryCompleted(TajoCli.java:387)
      	at org.apache.tajo.cli.TajoCli.executeQuery(TajoCli.java:365)
      	at org.apache.tajo.cli.TajoCli.executeParsedResults(TajoCli.java:322)
      	at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:311)
      	at org.apache.tajo.cli.TajoCli.main(TajoCli.java:490)
      Apr 30, 2014 11:04:01 AM INFO: parquet.hadoop.ParquetFileReader: reading another 1 footers

      The patch fixes the bug where CreateTableNode takes the wrong schema.

      In addition, I found the potential problem where ParquetFile stores the Tajo Schema into its extra meta data. I think that it will problem when users renames its database name or table name. So, I removed the code to insert a Tajo schema into extra metadata and I changed Parquet reading to not use extra metadata.

      Tajo mainly uses Catalog system to manage schemas, and reading parquet files in Tajo depends on Tajo catalog. So, it will work well. Also, other systems can access parquet files by directly reading parquet's native schema.


        1. TAJO-806.patch
          28 kB
          Hyunsik Choi



            • Assignee:
              hyunsik Hyunsik Choi
              hyunsik Hyunsik Choi
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: