Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25592

Improvement of parser, optimizer and execution for Flink Batch SQL

    XMLWordPrintableJSON

Details

    Description

      This is a parent JIRA to track improvements on Flink Batch SQL, including parser, optimizer and execution.
      For example,
      1. using Hive dialect and default dialect, some sql query would be translated into different plans
      2. specify hash/sort aggregate strategy and hash/sort merge join strategy in sql hint
      3. take parquet metadata into consideration in optimization
      4. and so on
      Please note, some improvements are not limited to batch sql. Maybe streaming sql job could also benefits from some improvements in this JIRA.

      Attachments

        1.
        Case when would be translated into different expression in Hive dialect and default dialect Sub-task Resolved Unassigned
        2.
        A redundant scan could be skipped if it is an input of join and the other input is empty after partition prune Sub-task Closed Yunhong Zheng
        3.
        Take parquet metadata into consideration when source is parquet files Sub-task Open luoyuxia
        4.
        Specify hash/sort aggregate strategy in SQL hint Sub-task Closed ZhuoYu Chen
        5.
        Specify hash/sortmerge join in SQL hint Sub-task Open luoyuxia
        6.
        Remove useless aggregate function Sub-task Open godfrey he
        7.
        Batch get statistics of multiple partitions instead of get one by one Sub-task Resolved tartarus
        8.
        Cannot join hive tables with different column types Sub-task Closed Unassigned
        9.
        Unexpected aggregate plan after load hive module Sub-task Resolved luoyuxia
        10.
        UnsupportedOperationException would thrown out when hash shuffle by a field with array type Sub-task Closed dalongliu
        11.
        CalcOperator CodeGenException: Boolean expression type expected Sub-task Resolved Unassigned
        12.
        Flink doesn't support Hive primitive type void yet Sub-task Closed luoyuxia
        13.
        Hive Dialect support implicit conversion Sub-task Resolved luoyuxia
        14.
        Unexpected rexnode : org.apache.calcite.rex.RexFieldAccess Sub-task Resolved Unassigned
        15.
        CodeGenException: Unable to find common type of Sub-task Resolved Unassigned
        16.
        Failed to get Hive result type from org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox Sub-task Resolved Unassigned
        17.
        Field #1: values VARCHAR(2147483647) ARRAY does not exist for expression index($0, 0) Sub-task Closed Unassigned
        18.
        throw NPE if multi MAPJOIN hint union all Sub-task Closed luoyuxia
        19.
        Support Insert Multi-Table Sub-task Closed luoyuxia
        20.
        Support Hive bucket table Sub-task In Progress luoyuxia
        21.
        Flink supports all modes of Hive UDAF (PARTIAL1, PARTIAL2, FINAL, COMPLETE) Sub-task Resolved luoyuxia
        22.
        Min aggregate function support type: ''ARRAY''. Sub-task Resolved Unassigned
        23.
        Flink batch support for Hive StorageHandlers Sub-task Open luoyuxia
        24.
        Hive dialect fails using union map type Sub-task Open luoyuxia
        25.
        Add Hive partition when flink has no data to write Sub-task Closed tartarus
        26.
        Fix Hive sink not write a success file after finish writing in batch mode Sub-task Closed tartarus
        27.
        Allow user to configure whether to enable sort or not when it's for dynamic parition writing for HiveSource Sub-task Closed luoyuxia

        Activity

          People

            Unassigned Unassigned
            jingzhang Jing Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            21 Start watching this issue

            Dates

              Created:
              Updated: