• Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.15.0
    • 1.19.0
    • None


      The final step in the ongoing "result set loader" saga is to merge the revised JSON reader into master. This reader does two key things:

      • Demonstrates the prototypical "late schema" style of data reading (discover schema while reading).
      • Implements many tricks and hacks to handle schema changes while loading.
      • Shows that, even with all these tricks, the only true solution is to actually have a schema.

      The new JSON reader:

      • Uses an expanded state machine when parsing rather than the complex set of if-statements in the current version.
      • Handles reading a run of nulls before seeing the first data value (as long as the data value shows up in the first record batch).
      • Uses the result-set loader to generate fixed-size batches regardless of the complexity, depth of structure, or width of variable-length fields.

      While the JSON reader itself is helpful, the key contribution is that it shows how to use the entire kit of parts: result set loader, projection framework, and so on. Since the projection framework can handle an external schema, it is also a handy foundation for the ongoing schema project.

      Key work to complete after this merger will be to reconcile actual data with the external schema. For example, if we know a column is supposed to be a VarChar, then read the column as a VarChar regardless of the type JSON itself picks. Or, if a column is supposed to be a Double, then convert Int and String JSON values into Doubles.

      The Row Set framework was designed to allow inserting custom column writers. This would be a great opportunity to do the work needed to create them. Then, use the new JSON framework to allow parsing a JSON field as a specified Drill type.


        Issue Links



              Paul.Rogers Paul Rogers
              Paul.Rogers Paul Rogers
              Vova Vysotskyi Vova Vysotskyi
              0 Vote for this issue
              2 Start watching this issue