Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2072

Do Not Determine Both Min/Max for Binary Stats

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      I'm looking at some benchmarking code of Apache ORC v.s. Apache Parquet and see that Parquet is quite a bit slower for writes (reads TBD). Based on my investigation, I have noticed a significant amount of time spent in determining min/max for binary types.

      One quick improvement is to bypass a "max" value determinization if the value has already been determined to be a "min".

      While I'm at it, remove calls to deprecated functions.

      Attachments

        Activity

          People

            belugabehr David Mollitor
            belugabehr David Mollitor
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: