Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9774

Document metadata

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Invalid
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: Documentation
    • Labels:
      None
    • Environment:
      Linux

      Description

      I would like to write down a dataframe into a parquet file.

      The problem that I have is the output dataframe shows up as

      ```0 {'field0': 5, 'field1': 8}
      1 {'field0': 5, 'field1': 8}
      2 {'field0': 4, 'field1': 7}```

      while what I want is

      ```0 {'A': 5, 'B': 8}
      1 {'A': 5, 'B': 8}
      2 {'A': 4, 'B': 7}```

      As I understand the discrepancy is because I did not pass the metadata in the creation of the table. That is I did

      schema_metadata = ::arrow::key_value_metadata("pandas", metadata.data());

      schema = std::make_shared<arrow::Schema>(schema_vector, schema_metadata);

      arrow_table = arrow::Table::Make(schema, columns, row_group_size);

      status = parquet::arrow::WriteTable( *arrow_table, pool, out_stream, row_group_size, writer_properties, ...)

      The problem is that I could not find any documentation on how the metadata is to be built. Adding documentation would be much helpful.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mathieuDS Mathieu Dutour Sikiric
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified