Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-262

[Format] Add a new format document for metadata and logical types for messaging and IPC / on-wire/file representations

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0
    • Component/s: Format
    • Labels:
      None

      Description

      The existing document

      https://github.com/apache/arrow/blob/master/format/Layout.md

      Only describes the physical layout of fixed-size, variable-size, and other nested types (struct, union)

      Meanwhile, we have begun drafting Flatbuffers IDL for Arrow metadata:

      https://github.com/apache/arrow/blob/master/format/Message.fbs

      I will add a document that will, to begin with:

      • Explain the mapping between logical types in the metadata. For example, definitions of important data types: integers, floating point, boolean, string (UTF-8) and binary
      • Where relevant, describing how each logical type's physical memory is converted to metadata for messaging purposes (e.g. the RecordBatch concept in the IDL)

      We have already begun prototype implementations in the C++ codebase (https://github.com/apache/arrow/tree/master/cpp/src/arrow/ipc) so this will serve as implementation-agnostic documentation.

      Subsequently, I will make a follow up patch for discussion to hopefully address metadata shortfall between the canonical Arrow metadata and the similar metadata used by the bespoke Feather format (https://github.com/wesm/feather/blob/master/cpp/src/feather/metadata.fbs)

        Attachments

          Activity

            People

            • Assignee:
              wesmckinn Wes McKinney
              Reporter:
              wesmckinn Wes McKinney
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: