Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2220

Parquet Filter predicate storing nested string causing OOM's

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • parquet-format
    • None

    Description

      Each Instance of ColumnFilterPredicate stores the filter values in toString variable eagerly. Which is not useful

      static abstract class ColumnFilterPredicate<T extends Comparable<T>> implements FilterPredicate, Serializable  {
        private final Column<T> column;
        private final T value;
        private final String toString; 
      
      
      protected ColumnFilterPredicate(Column<T> column, T value) {
        this.column = Objects.requireNonNull(column, "column cannot be null");
      
        // Eq and NotEq allow value to be null, Lt, Gt, LtEq, GtEq however do not, so they guard against
        // null in their own constructors.
        this.value = value;
      
        String name = getClass().getSimpleName().toLowerCase(Locale.ENGLISH);
        this.toString = name + "(" + column.getColumnPath().toDotString() + ", " + value + ")";
      }

       

       

      If your filter predicate is too long/nested this can take a lot of memory while creating Filter.
      We have seen in our productions this can go upto 4gbs of space while opening multiple parquet readers

      Same thing is replicated in BinaryLogicalFilterPredicate. Where toString is eagerly calculated and stored in string and lot of duplication is happening while making And/or filter.

      I did not find use case of storing it so eagerly

      Attachments

        Activity

          People

            Unassigned Unassigned
            abhiSumo304 Abhishek Jain
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: