Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2026

Allow empty row in parquet file

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.12.0
    • 1.14.0
    • parquet-mr

    Description

      PARQUET-1851 starts abandon to write parquet files with schema (meta information), but with 0 rows, aka empty files.
      In result it prevent to store empty tables in DRILL by using parquet files, for example:

      CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0
      CREATE TABLE dfs.tmp.%s AS select * from dfs.`parquet/alltypes_required.parquet` where `col_int` = 0
      create table dfs.tmp.%s as select * from dfs.`parquet/empty/complex/empty_complex.parquet`

      So PARQUET-1851 breaks the following test cases:

      TestUntypedNull.testParquetTableCreation   TestParquetWriterEmptyFiles.testComplexEmptyFileSchema   TestParquetWriterEmptyFiles.testWriteEmptyFile   TestParquetWriterEmptyFiles.testWriteEmptyFileWithSchema   TestParquetWriterEmptyFiles.testWriteEmptySchemaChange TestMetastoreCommands.testAnalyzeEmptyRequiredParquetTable  TestMetastoreCommands.testSelectEmptyRequiredParquetTable

       I suggest to use warning in the process of creating empty parquet files or create alternative endBlock for backward compatibility with other tools:

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vitalii Vitalii Diravka
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: