[DRILL-7083] Wrong data type for explicit partition column beyond file depth - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.15.0
Fix Version/s: 1.17.0
Component/s: None
Labels:
None

Description

Consider the simple case in ~~DRILL-7082~~. That ticket talks about implicit partition columns created by the wildcard. Consider a very similar case:

SELECT a, b, c, dir0, dir1 FROM `myTable`

Where myTable is a directory of CSV files, each with schema (a, b, c):

myTable
|- file1.csv
|- nested
   |- file2.csv

If the query is run in "stock" Drill, the planner will place both files within a single scan operator as described in ~~DRILL-7082~~. The result schema will be:

(a VARCHAR, b VARCHAR, c VARCHAR, dir0 VARCHAR, dir1 INT)

Notice that last column: why is "dir1" a (nullable) INT? The partition mechanism only recognizes partitions that actually exist, leaving the Project operator to fill in (with a Nullable INT) any partitions that don't exist (any directory levels not actually seen by the scan operator.)

Now, using the same trick as in ~~DRILL-7082~~, try the query

SELECT a, b, c, dir0 FROM `myTable`

Again, the trick causes Drill to read each file in a separate scan operator (simulating what happens when queries run at scale.)

The scan operator for file1.csv will see no partitions, so it will omit "dir0" and the Project operator will helpfully fill in a Nullable INT. The scan operator for file2.csv sees one level of partition, so sets dir0 to nested as a Nullable VARCHAR.

What does the client see? Two records: one with "dir0" as a Nullable INT, the other as a Nullable VARCHAR. Client such as JDBC and ODBC see a hard schema change between the two records.

The two cases described above are really two versions of the same issue. Clients expect that, if they use the "dir0", "dir1", ... columns, that the type is always Nullable Varchar so that the schema stays consistent across batches.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Paul Rogers

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Mar/19 03:59

Updated:: 25/Jun/19 13:32

Resolved:: 25/Jun/19 13:32