Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2026

Allow empty row in parquet file

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.12.0
    • Fix Version/s: 1.13.0
    • Component/s: parquet-mr
    • Labels:

      Description

      PARQUET-1851 starts abandon to write parquet files with schema (meta information), but with 0 rows, aka empty files.
      In result it prevent to store empty tables in DRILL by using parquet files, for example:

      CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0
      CREATE TABLE dfs.tmp.%s AS select * from dfs.`parquet/alltypes_required.parquet` where `col_int` = 0
      create table dfs.tmp.%s as select * from dfs.`parquet/empty/complex/empty_complex.parquet`

      So PARQUET-1851 breaks the following test cases:

      TestUntypedNull.testParquetTableCreation   TestParquetWriterEmptyFiles.testComplexEmptyFileSchema   TestParquetWriterEmptyFiles.testWriteEmptyFile   TestParquetWriterEmptyFiles.testWriteEmptyFileWithSchema   TestParquetWriterEmptyFiles.testWriteEmptySchemaChange TestMetastoreCommands.testAnalyzeEmptyRequiredParquetTable  TestMetastoreCommands.testSelectEmptyRequiredParquetTable

       I suggest to use warning in the process of creating empty parquet files or create alternative endBlock for backward compatibility with other tools:

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                vitalii Vitalii Diravka
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: