Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9774

Document metadata

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Invalid
    • 1.0.0
    • None
    • Documentation
    • None
    • Linux

    Description

      I would like to write down a dataframe into a parquet file.

      The problem that I have is the output dataframe shows up as

      ```0 {'field0': 5, 'field1': 8}
      1 {'field0': 5, 'field1': 8}
      2 {'field0': 4, 'field1': 7}```

      while what I want is

      ```0 {'A': 5, 'B': 8}
      1 {'A': 5, 'B': 8}
      2 {'A': 4, 'B': 7}```

      As I understand the discrepancy is because I did not pass the metadata in the creation of the table. That is I did

      schema_metadata = ::arrow::key_value_metadata("pandas", metadata.data());

      schema = std::make_shared<arrow::Schema>(schema_vector, schema_metadata);

      arrow_table = arrow::Table::Make(schema, columns, row_group_size);

      status = parquet::arrow::WriteTable( *arrow_table, pool, out_stream, row_group_size, writer_properties, ...)

      The problem is that I could not find any documentation on how the metadata is to be built. Adding documentation would be much helpful.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mathieuDS Mathieu Dutour Sikiric
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified