XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.15.0
    • Fix Version/s: 1.19.0
    • Component/s: None
    • Labels:

      Description

      The final step in the ongoing "result set loader" saga is to merge the revised JSON reader into master. This reader does two key things:

      • Demonstrates the prototypical "late schema" style of data reading (discover schema while reading).
      • Implements many tricks and hacks to handle schema changes while loading.
      • Shows that, even with all these tricks, the only true solution is to actually have a schema.

      The new JSON reader:

      • Uses an expanded state machine when parsing rather than the complex set of if-statements in the current version.
      • Handles reading a run of nulls before seeing the first data value (as long as the data value shows up in the first record batch).
      • Uses the result-set loader to generate fixed-size batches regardless of the complexity, depth of structure, or width of variable-length fields.

      While the JSON reader itself is helpful, the key contribution is that it shows how to use the entire kit of parts: result set loader, projection framework, and so on. Since the projection framework can handle an external schema, it is also a handy foundation for the ongoing schema project.

      Key work to complete after this merger will be to reconcile actual data with the external schema. For example, if we know a column is supposed to be a VarChar, then read the column as a VarChar regardless of the type JSON itself picks. Or, if a column is supposed to be a Double, then convert Int and String JSON values into Doubles.

      The Row Set framework was designed to allow inserting custom column writers. This would be a great opportunity to do the work needed to create them. Then, use the new JSON framework to allow parsing a JSON field as a specified Drill type.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Paul.Rogers Paul Rogers
                Reporter:
                Paul.Rogers Paul Rogers
                Reviewer:
                Vova Vysotskyi
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: