[DRILL-6953] Merge row set-based JSON reader - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.15.0
Fix Version/s: 1.19.0
Component/s: None
Labels:
- doc-impacting

Description

The final step in the ongoing "result set loader" saga is to merge the revised JSON reader into master. This reader does two key things:

Demonstrates the prototypical "late schema" style of data reading (discover schema while reading).
Implements many tricks and hacks to handle schema changes while loading.
Shows that, even with all these tricks, the only true solution is to actually have a schema.

The new JSON reader:

Uses an expanded state machine when parsing rather than the complex set of if-statements in the current version.
Handles reading a run of nulls before seeing the first data value (as long as the data value shows up in the first record batch).
Uses the result-set loader to generate fixed-size batches regardless of the complexity, depth of structure, or width of variable-length fields.

While the JSON reader itself is helpful, the key contribution is that it shows how to use the entire kit of parts: result set loader, projection framework, and so on. Since the projection framework can handle an external schema, it is also a handy foundation for the ongoing schema project.

Key work to complete after this merger will be to reconcile actual data with the external schema. For example, if we know a column is supposed to be a VarChar, then read the column as a VarChar regardless of the type JSON itself picks. Or, if a column is supposed to be a Double, then convert Int and String JSON values into Doubles.

The Row Set framework was designed to allow inserting custom column writers. This would be a great opportunity to do the work needed to create them. Then, use the new JSON framework to allow parsing a JSON field as a specified Drill type.

Attachments

Issue Links

Dependent

DRILL-6835 Schema Provision using File / Table Function

Resolved

incorporates

DRILL-7516 count(*) on empty JSON produce nothing

Open

DRILL-7572 JSON structure parser

Resolved

DRILL-7574 Generalize projection parser

Resolved

is part of

DRILL-8037 Add V2 JSON Format Plugin based on EVF

Resolved

links to

GitHub Pull Request #1913

(1 links to)

Activity

People

Assignee:: Paul Rogers

Reporter:: Paul Rogers

Reviewer:: Vova Vysotskyi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/Jan/19 06:05

Updated:: 05/Nov/21 01:34

Resolved:: 25/Apr/21 21:39