Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-323

Predicate push down for nested fields

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.5.0
    • Java
    • None

    Description

      1. Predicate Pushdown For Nested field

      1.1 Objective

      In the ORC(Optimized Row Columnar) all the primitive type column consist of index. Predicate refer to the column name in where clause and pushdown mean skipping rows groups, strips and block while reading by comparing the meta store in the strips. Meta consist of max, sum ,min value present in the given column.

      Currently predicate pushdown only work for top level column of the schema.

      Extending the Predicate Pushdown for nested structure in hive.

      *1.2 Current state *-

      1.2.1 Schema
      struct<int1:int, complex:struct<int2:int,String1:string>>

      *1.2.2 Search Argument *
      SearchArgument sarg = SearchArgumentFactory.newBuilder()
      .startAnd()
      .startNot()
      .lessThan(“int2", PredicateLeaf.Type.LONG, 300000L)
      .end()
      .lessThan("int2", PredicateLeaf.Type.LONG, 600000L)
      .end()
      .build();

      1.2.3 Pushdown Predicate not supported in Nested field in ORC

      private boolean[] populatePpdSafeConversion() {
      if (fileSchema == null || readerSchema == null || readerFileTypes == null)

      { return null; }

      boolean[] result = new boolean[readerSchema.getMaximumId() + 1];
      boolean safePpd = validatePPDConversion(fileSchema, readerSchema);
      result[readerSchema.getId()] = safePpd;
      List<TypeDescription> children = readerSchema.getChildren();
      if (children != null) {
      for (TypeDescription child : children)

      { TypeDescription fileType = getFileType(child.getId()); safePpd = validatePPDConversion(fileType, child); result[child.getId()] = safePpd; }

      }
      return result;
      }

      In populatePpdSafeConversion() this function only check the conversion validation for top level field. So validation of nested field search argument fails.

      static int findColumns(SchemaEvolution evolution,
      String columnName) {
      TypeDescription readerSchema = evolution.getReaderBaseSchema();
      List<String> fieldNames = readerSchema.getFieldNames();
      List<TypeDescription> children = readerSchema.getChildren();
      for (int i = 0; i < fieldNames.size(); ++i) {
      if (columnName.equals(fieldNames.get))

      { TypeDescription result = evolution.getFileType(children.get(i)); return result == null ? -1 : result.getId(); }

      }
      return -1;
      }

      In findColumns() all the only top level column is referred. “Int2” is nested column due to which “-1” is return instead of index of “int2”.

      1.2.4 Result -

      PPD is not working for int2 field in the search argument.

      *1.3 Expected state - *

      1.3.1 Schema
      struct<int1:int, complex:struct<int2:int,String1:string>>

      1.3.2 Query
      Replacing Column name in PredicateLeaf with fully qualified column path.

      SearchArgument sarg = SearchArgumentFactory.newBuilder()
      .startAnd()
      .startNot()
      .lessThan(“complex.int2", PredicateLeaf.Type.LONG, 300000L)
      .end()
      .lessThan("complex.int2", PredicateLeaf.Type.LONG, 600000L)
      .end()
      .build();

      1.3.3 Pushdown Predicate support in Nested field

      https://github.com/apache/orc/pull/232

      1.3.4 Result

      PPD is working for complex.int2 field in the search argument.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ashish-kumar-sharma Ashish Sharma
            ashish-kumar-sharma Ashish Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment