Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2115

[Storm SQL] 'IN' with subquery making implicit aggregate calls which is having 'null' as name

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0, 1.1.0
    • storm-sql
    • None

    Description

      "SELECT ID FROM FOO WHERE ID NOT IN (SELECT 1 AS ID FROM FOO)" throws duplicated field 'null'.

      Here is logical plan from Calcite.

      LogicalFilter(condition=[NOT(CASE(=($3, 0), false, IS NOT NULL($7), true, IS NULL($5), null, <($4, $3), null, false))]): rowcount = 1.0, cumulative cost = {10.375 rows, 16.0 cpu, 0.0 io}, id = 24
        LogicalJoin(condition=[=($5, $6)], joinType=[left]): rowcount = 1.0, cumulative cost = {9.375 rows, 15.0 cpu, 0.0 io}, id = 23
          LogicalProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$0]): rowcount = 1.0, cumulative cost = {5.25 rows, 11.0 cpu, 0.0 io}, id = 18
            LogicalJoin(condition=[true], joinType=[inner]): rowcount = 1.0, cumulative cost = {4.25 rows, 5.0 cpu, 0.0 io}, id = 17
              EnumerableTableScan(table=[[FOO]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 io}, id = 12
              LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)]): rowcount = 1.0, cumulative cost = {3.25 rows, 4.0 cpu, 0.0 io}, id = 16
                LogicalProject($f0=[$0], $f1=[true]): rowcount = 1.0, cumulative cost = {2.0 rows, 4.0 cpu, 0.0 io}, id = 15
                  LogicalProject(ID=[1]): rowcount = 1.0, cumulative cost = {1.0 rows, 2.0 cpu, 0.0 io}, id = 14
                    EnumerableTableScan(table=[[FOO]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 io}, id = 13
          LogicalAggregate(group=[{0}], agg#0=[MIN($1)]): rowcount = 1.0, cumulative cost = {3.125 rows, 4.0 cpu, 0.0 io}, id = 22
            LogicalProject($f0=[$0], $f1=[true]): rowcount = 1.0, cumulative cost = {2.0 rows, 4.0 cpu, 0.0 io}, id = 21
              LogicalProject(ID=[1]): rowcount = 1.0, cumulative cost = {1.0 rows, 2.0 cpu, 0.0 io}, id = 20
                EnumerableTableScan(table=[[FOO]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 io}, id = 19
      

      In this case AggregateCall.name could be null, so there could be duplicated fields in trident tuple which are having 'null' as name.

      We should refer the RowType of LogicalAggregate, but another issue is that its name could be same as upstream's output field name, so it makes another duplication.

      One way to resolve this is assigning temporal field names while aggregating, and finally replace them with fields name in RowType of LocalAggregate.

      We can achieve this easier when STORM-2072 will be merged.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kabhwan Jungtaek Lim Assign to me
            kabhwan Jungtaek Lim
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment