Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-262

[Format] Add a new format document for metadata and logical types for messaging and IPC / on-wire/file representations

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.1.0
    • Format
    • None

    Description

      The existing document

      https://github.com/apache/arrow/blob/master/format/Layout.md

      Only describes the physical layout of fixed-size, variable-size, and other nested types (struct, union)

      Meanwhile, we have begun drafting Flatbuffers IDL for Arrow metadata:

      https://github.com/apache/arrow/blob/master/format/Message.fbs

      I will add a document that will, to begin with:

      • Explain the mapping between logical types in the metadata. For example, definitions of important data types: integers, floating point, boolean, string (UTF-8) and binary
      • Where relevant, describing how each logical type's physical memory is converted to metadata for messaging purposes (e.g. the RecordBatch concept in the IDL)

      We have already begun prototype implementations in the C++ codebase (https://github.com/apache/arrow/tree/master/cpp/src/arrow/ipc) so this will serve as implementation-agnostic documentation.

      Subsequently, I will make a follow up patch for discussion to hopefully address metadata shortfall between the canonical Arrow metadata and the similar metadata used by the bespoke Feather format (https://github.com/wesm/feather/blob/master/cpp/src/feather/metadata.fbs)

      Attachments

        Activity

          People

            wesm Wes McKinney
            wesm Wes McKinney
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: