Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-423

Make writing Avro to Parquet less noisy

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Implemented
    • 1.8.0
    • 1.10.0, 1.8.2
    • parquet-avro
    • None

    Description

      When writing Avro files to disk using the AvroParquetWriter for each column in the file some statistics are written to the Logging system.
      When writing files based on a large Avro schema often the output of this logging is no longer useful and becomes a hassle.

      Because the logging level is hardcoded (why?) into the parquet library I would like to introduce a switch that allows to enable/disable this type of logging.

      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for [IPAddress] BINARY: 60 values, 26B raw, 47B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 7 entries, 77B raw, 7B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 102B for [country] BINARY: 60 values, 26B raw, 47B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 7 entries, 119B raw, 7B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 152B for [windowid] BINARY: 60 values, 33B raw, 51B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 12 entries, 480B raw, 12B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 77B for [customerId] BINARY: 58 values, 22B raw, 42B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 7 entries, 49B raw, 7B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 86B for [sessionId] BINARY: 58 values, 28B raw, 43B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 10 entries, 110B raw, 10B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 93B for [sessionEventNr] INT64: 58 values, 34B raw, 48B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 14 entries, 112B raw, 14B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 114B for [visitId] BINARY: 58 values, 28B raw, 43B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 10 entries, 250B raw, 10B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for [visitEventNr] INT64: 58 values, 34B raw, 45B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 11 entries, 88B raw, 11B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 112B for [timestamp] INT64: 58 values, 50B raw, 66B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 46 entries, 368B raw, 46B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 85B for [IPAddress] BINARY: 58 values, 22B raw, 42B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 7 entries, 77B raw, 7B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 97B for [country] BINARY: 58 values, 22B raw, 42B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 7 entries, 119B raw, 7B comp}
      Jan 12, 2016 1:43:00 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 144B for [windowid] BINARY: 58 values, 28B raw, 43B comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 10 entries, 400B raw, 10B comp}
      

      Attachments

        Activity

          People

            nielsbasjes Niels Basjes
            nielsbasjes Niels Basjes
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: