Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2122

Adding Bloom filter to small Parquet file bloats in size X1700

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.13.0
    • None
    • parquet-cli, parquet-mr
    • None

    Description

      Converting a small, 14 rows/1 string column csv file to Parquet without bloom filter yields a 600B file, adding '.withBloomFilterEnabled(true)' to ParquetWriter then yields a 1049197B file.

      It isn't clear what the extra space is used by.

      Attached csv and bloated Parquet files.

      Attachments

        1. data_index_bloom.parquet
          1.00 MB
          Ze'ev Maor
        2. data.csv
          0.1 kB
          Ze'ev Maor

        Activity

          People

            Unassigned Unassigned
            WiredWolf Ze'ev Maor
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: