Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The existing document
https://github.com/apache/arrow/blob/master/format/Layout.md
Only describes the physical layout of fixed-size, variable-size, and other nested types (struct, union)
Meanwhile, we have begun drafting Flatbuffers IDL for Arrow metadata:
https://github.com/apache/arrow/blob/master/format/Message.fbs
I will add a document that will, to begin with:
- Explain the mapping between logical types in the metadata. For example, definitions of important data types: integers, floating point, boolean, string (UTF-8) and binary
- Where relevant, describing how each logical type's physical memory is converted to metadata for messaging purposes (e.g. the RecordBatch concept in the IDL)
We have already begun prototype implementations in the C++ codebase (https://github.com/apache/arrow/tree/master/cpp/src/arrow/ipc) so this will serve as implementation-agnostic documentation.
Subsequently, I will make a follow up patch for discussion to hopefully address metadata shortfall between the canonical Arrow metadata and the similar metadata used by the bespoke Feather format (https://github.com/wesm/feather/blob/master/cpp/src/feather/metadata.fbs)