Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-8182

File scan nodes not differentiated by format config

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.20.0
    • 1.20.2
    • Storage - Other
    • None

    Description

      Two file scans that differ only by format config overriden with table functions may be genuinely different in terms of the data they return. The format config options may affect the behaviour of the format parser (date strings, delimiters, etc.) possibly directing format plugin to entirely different data within the file. Such scans should not be considered the same by the query planner. This illustrated by the following example based on the Excel format plugin.

      When a query includes multiple SELECTs against a workbook by using TABLE functions to access different sheets, and those sheets contain a column with the same name, then values for that column come a single sheet for both SELECTs.  To reproduce, run the following query against the attachment and note that the `Name` values returned from the Products sheet are `Name` values from the Customers sheet.

       

      with
      prod as (
          select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type => 'excel', sheetName => 'Products'))
      )
      , cust as (
          select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type => 'excel', sheetName => 'Customers'))
      )
      select * from cust join prod on cust.Id = prod.Id; 

      Attachments

        1. Products_Customers_Orders.xlsx
          7 kB
          James Turton

        Activity

          People

            cgivre Charles Givre
            dzamo James Turton
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: