Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
One of Drill's key benefits is the ability to query JSON-formatted data. Much great work has been done. But, unless someone happens to be a Drill developer, the details of exactly how Drill handles various JSON formats can be hard to find.
We should document how Drill handles various JSON scenarios.
- SELECT * (schema inferred)
- SELECT a, b, c (schema implied by query)
And various JSON structures:
- Top-level structure (list of maps. Can we handle an array of maps? A list of scalars?)
- Changes of the top-level map structure across rows.
- New field appears later in the file. (Was
{a: 1, b: "s"}
, now is
{a: 1, b: "s", c: 10} - Fields disappear later in the file
- Fields change type
- Start of file has many nulls for a field, later in file has non-null values.
- New field appears later in the file. (Was
{a: 1, b: "s"}
- How Drill handles array fields
- Array field is null:
{ a: [10, 20]}
,
{ a: null } - Array contains nulls: { a: [10, null, 20] }
- Array contains single scalar type (number or string)
- Array contains multiple scalar types (number and string)
- Aray contains structured types (array, map)
- Array field is null:
{ a: [10, 20]}
- How Drill handles nested maps
- Explicit select: a, b.c, b.d: {a: 1, b:
Unknown macro: { c}
}
- Implicit select: *
- How data is delivered to Drill client
- How data is delivered to JDBC/ODBC clients
- Explicit select: a, b.c, b.d: {a: 1, b:
- Size issues
- Very large records (what is max size?)
- Very large strings
- Vary large arrays
Naming
- Support for case-sensitive names: { a: 1, A: "foo" }
The above is legal JSON, but causes problems with the case-insensitive naming rules of Drill
Along with any other detailed information not covered by the above list.
Attachments
Issue Links
- is related to
-
DRILL-4824 Null maps / lists and non-provided state support for JSON fields. Numeric types promotion.
- Open