Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8896

WITH query AS + SELECT query JOIN other throws invalid type

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: P3
    • Resolution: Unresolved
    • Affects Version/s: 2.16.0
    • Fix Version/s: None
    • Component/s: dsl-sql
    • Labels:
      None

      Description

      The first one of the three following queries fails, despite queries being equivalent:

      Pipeline p = Pipeline.create();
      
      Schema schemaA =
          Schema.of(
              Schema.Field.of("id", Schema.FieldType.BYTES),
              Schema.Field.of("fA1", Schema.FieldType.STRING));
      
      Schema schemaB =
          Schema.of(
              Schema.Field.of("id", Schema.FieldType.STRING),
              Schema.Field.of("fB1", Schema.FieldType.STRING));
      
      PCollection<Row> inputA =
          p.apply(Create.of(ImmutableList.<Row>of()).withCoder(SchemaCoder.of(schemaA)));
      
      PCollection<Row> inputB =
          p.apply(Create.of(ImmutableList.<Row>of()).withCoder(SchemaCoder.of(schemaB)));
      
      // Fails
      String query1 =
          "WITH query AS "
              + "( "
              + " SELECT id, fA1, fA1 AS fA1_2 "
              + " FROM tblA"
              + ") "
              + "SELECT fA1, fB1, fA1_2 "
              + "FROM query "
              + "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";
      
      // Ok
      String query2 =
          "WITH query AS "
              + "( "
              + " SELECT fA1, fB1, fA1 AS fA1_2 "
              + " FROM tblA "
              + " JOIN tblB "
              + " ON (TO_HEX(tblA.id) = tblB.id) "
              + ")"
              + "SELECT fA1, fB1, fA1_2 "
              + "FROM query ";
      
      // Ok
      String query3 =
          "WITH query AS "
          + "( "
          + " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 "
          + " FROM tblA"
          + ") "
          + "SELECT fA1, fB1, fA1_2 "
          + "FROM query "
          + "JOIN tblB ON (query.id = tblB.id)";
      
      Schema transform3 =
          PCollectionTuple.of("tblA", inputA)
              .and("tblB", inputB)
              .apply(SqlTransform.query(query3))
              .getSchema();
      System.out.println(transform3);
      
      Schema transform2 =
          PCollectionTuple.of("tblA", inputA)
              .and("tblB", inputB)
              .apply(SqlTransform.query(query2))
              .getSchema();
      System.out.println(transform2);
      
      Schema transform1 =
          PCollectionTuple.of("tblA", inputA)
              .and("tblB", inputB)
              .apply(SqlTransform.query(query1))
              .getSchema();
      System.out.println(transform1);
      

       

      The error is:

      Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is invalid for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is invalid for  type 'RecordType(VARBINARY id, VARCHAR fA1)' at org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197)

       

      If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), all queries work fine. 

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                fdiazgon fdiazgon
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: