Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8952

[C++] Support for textual, JSON schema representation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++

    Description

      Currently, Arrow has no textual representation for its schema that could serve the same purposes as JSON-Schema for JSON, the .proto files for Protobuf, etc. This issue is about adding such a text representation for an Arrow schema, to fill the same use cases that these textual representations fill for other data serialization formats.

      The requirements for a text schema representation:

      • Data, not code (can be used without being run directly, unlike e.g. calls to the Python API to create a Schema object)
      • Readable by people who are experts in their field (e.g. data scientists, etc.) and are however not Arrow experts, without needing the doc side by side
      • Small modifications possible with no or light usage of the doc (e.g. changing a field from int32 to int64)
      • Writing new schemas from scratch possible with the doc for non-Arrow experts
      • Not tied to a particular version of Arrow & compatible between Arrow versions

      And from a software engineering point of view, it would be very desirable for the implementation to not add another library dependency for Arrow (which already has many).

      After discussion on the mailing list, the JSON representation for Flatbuffers data seemed the best candidate. It is a format supported by the Flatbuffers projects for serializing Flatbuffers assets in a human-readable format, for inclusion under source-control. And there is already functionality in Arrow to convert Schema objects to a Flatbuffers representation. This would meet all the requirements above, while requiring only a small amount of new Arrow code to implement.

      This issue will add functions Arrow to load and save a textual, JSON representation of an Arrow schema, by first converting it to a FlatBuffers object, and then using the Flatbuffers functionality to save/load such objects as JSON.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              chrish42 Christian Hudon
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 50m
                  2h 50m