Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
The purpose of this Jira is to introduce new data type in Phoenix: Binary JSON (BSON) to manage more complex document data structures in Phoenix.
BSON or Binary JSON is a Binary-Encoded serialization of JSON-like documents. BSON data type is specifically used for users to store, update and query part or whole of the BsonDocument in the most performant way without having to serialize/deserialize the document to/from binary format. Bson allows deserializing only part of the nested documents such that querying or indexing any attributes within the nested structure becomes more efficient and performant as the deserialization happens at runtime. Any other document structure would require deserializing the binary into the document, and then perform the query.
BSONSpec: https://bsonspec.org/
JSON and BSON are closely related by design. BSON serves as a binary representation of JSON data, tailored with specialized extensions for wider application scenarios, and finely tuned for efficient data storage and traversal. Similar to JSON, BSON facilitates the embedding of objects and arrays.
One particular way in which BSON differs from JSON is in its support for some more advanced data types. For instance, JSON does not differentiate between integers (round numbers), and floating-point numbers (with decimal precision). BSON does distinguish between the two and store them in the corresponding BSON data type (e.g. BsonInt32 vs BsonDouble). Many server-side programming languages offer advanced numeric data types (standards include integer, regular precision floating point number i.e. “float”, double-precision floating point i.e. “double”, and boolean values), each with its own optimal usage for efficient mathematical operations.
Another key distinction between BSON and JSON is that BSON documents have the capability to include Date or Binary objects, which cannot be directly represented in pure JSON format. BSON also provides the ability to store and retrieve user defined Binary objects. Likewise, by integrating advanced data structures like Sets into BSON documents, we can significantly enhance the capabilities of Phoenix for storing, retrieving, and updating Binary, Sets, Lists, and Documents as nested or complex data types.
Moreover, JSON format is human as well as machine readable, whereas BSON format is only machine readable. Hence, as part of introducing BSON data type, we also need to provide a user interface such that users can provide human readable JSON as input for BSON datatype.
This Jira also introduces access and update functions for BSON documents.
BSON_CONDITION_EXPRESSION can evaluate condition expression on the document fields, similar to how WHERE clause evaluates condition expression on various columns of the given row(s) for the relational tables.
BSON_UPDATE_EXPRESSION can perform one or more document field updates similar to how UPSERT statements can perform update to one or more columns of the given row(s) for the relational tables.
Phoenix can introduce more complex data structures like sets of scalar types, in addition to the nested documents and nested arrays provided by BSON.
Overall, by combining various functionalities available in Phoenix like secondary indexes, conditional updates, high throughput read/write with BSON, we can evolve Phoenix into highly scalable Document Database.
Attachments
Attachments
Issue Links
- relates to
-
PHOENIX-7396 BSON_VALUE function to retrieve BSON field value with given data type
- Resolved
-
PHOENIX-7463 New ANTLR grammar to evaluate BSON's SQL style expressions
- Open
- links to