Apache Drill
  1. Apache Drill
  2. DRILL-982

Parquet reader should return NULL value for non-exist column in execution phase, in stead of raising ExecutionSetupException

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Storage - Parquet
    • Labels:
      None

      Description

      If query a non-exist column against a parquet file, DRILL currently raises ExecutionSetupException:

      SELECT NON_EXIT_COLUMN from cp.`tpch/nation.parquet`;

      This will cause problem, when we have multiple parquet files, and the first one does not have the column, while the rest of them have the column.

      It would be better to return NULL expression during execution stage, if the columns does not exist in the parquet file.

      Later on, if DRILL adds a new option to verify column existence before executing a query ( just like what a schema-based system will do), we could throw PlanException in planning phase, once the parquet footer information is available to planner.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        76d 20h 28m 1 Jacques Nadeau 29/Aug/14 18:55
        Tony Stevenson made changes -
        Workflow no-reopen-closed, patch-avail, testing [ 12869315 ] Drill workflow [ 12934007 ]
        Jacques Nadeau made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Parth Chandra made changes -
        Assignee Parth Chandra [ parthc ] DrillCommitter [ drillcommitter ]
        Hide
        Parth Chandra added a comment -

        Reviewed

        Show
        Parth Chandra added a comment - Reviewed
        Jason Altekruse made changes -
        Assignee Jason Altekruse [ jaltekruse ] Parth Chandra [ parthc ]
        Show
        Jason Altekruse added a comment - https://reviews.apache.org/r/24759/
        Jason Altekruse made changes -
        Jason Altekruse made changes -
        Attachment 0001-DRILL-982-Return-nulls-for-non-existent-columns-in-p.patch [ 12662201 ]
        Jason Altekruse made changes -
        Attachment 0001-DRILL-982-Return-nulls-for-non-existent-columns-in-p.patch [ 12662201 ]
        Sudheesh Katkam made changes -
        Due Date 15/Aug/14
        Jacques Nadeau made changes -
        Fix Version/s 0.5.0 [ 12324880 ]
        Fix Version/s 0.4.0 [ 12324963 ]
        Jacques Nadeau made changes -
        Fix Version/s 0.4.0 [ 12324963 ]
        Jacques Nadeau made changes -
        Component/s Storage - Parquet [ 12322683 ]
        Jinfeng Ni made changes -
        Field Original Value New Value
        Summary Parquet reader should return NULL value in execution phase, in stead of raising ExecutionSetupException Parquet reader should return NULL value for non-exist column in execution phase, in stead of raising ExecutionSetupException
        Description If query a non-exist column against a parquet file, DRILL currently raise ExecutionSetupException:

        SELECT NON_EXIT_COLUMN from cp.`tpch/nation.parquet`;

        This will cause problem, when we have multiple parquet files, and the first one does not have the column, while the rest of them have the column.
         
        It would be better to return NULL expression during execution stage, if the columns does not exist in the parquet file.

        Later on, if DRILL adds a new option to verify column existence before executing a query ( just like what a schema-based system will do), we could throw PlanException in planning phase, once the parquet footer information is available to planner.


        If query a non-exist column against a parquet file, DRILL currently raises ExecutionSetupException:

        SELECT NON_EXIT_COLUMN from cp.`tpch/nation.parquet`;

        This will cause problem, when we have multiple parquet files, and the first one does not have the column, while the rest of them have the column.
         
        It would be better to return NULL expression during execution stage, if the columns does not exist in the parquet file.

        Later on, if DRILL adds a new option to verify column existence before executing a query ( just like what a schema-based system will do), we could throw PlanException in planning phase, once the parquet footer information is available to planner.


        Jinfeng Ni created issue -

          People

          • Assignee:
            DrillCommitter
            Reporter:
            Jinfeng Ni
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:
              Resolved:

              Development