Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8703

[R] schema$metadata should be properly typed

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.17.0
    • 1.0.0
    • R

    Description

      Currently, I try to export numeric data plus some metadata in Python into to a parquet file and read it in R. However, the metadata seems to be a dict in Python but a string in R. I would have expected a list (which is roughly a dict in Python). Am I missing something? Here is the code to demonstrate the issue:

      import sys
      import numpy as np
      import pyarrow as pa
      import pyarrow.parquet as pq
      print(sys.version)
      print(pa._version_)
      x = np.random.randint(0, 10, (10, 3))
      arrays = [pa.array(x[:, i]) for i in range(x.shape[1])]
      table = pa.Table.from_arrays(arrays=arrays, names=['A', 'B', 'C'],
      {{ metadata={'foo': '42'})}}
      pq.write_table(table, 'array.parquet', compression='snappy')
      table = pq.read_table('array.parquet')
      metadata = table.schema.metadata
      print(metadata)
      print(type(metadata))

       

      And in R:

       

      library(arrow)
      print(R.version)
      print(packageVersion("arrow"))
      table <- read_parquet("array.parquet", as_data_frame = FALSE)
      metadata <- table$schema$metadata
      print(metadata)
      print(is(metadata))
      print(metadata["foo"]) 

       

      Output Python:

      {{3.6.8 (default, Aug 7 2019, 17:28:10) }}
      [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
      0.13.0
      OrderedDict([(b'foo', b'42')])
      <class 'collections.OrderedDict'>

       

      Output R:

      [1] ‘0.17.0’
      [1] "\n-- metadata --\nfoo: 42"
      [1] "character" "vector" "data.frameRowLabels"
      {{[4] "SuperClassMethod" }}
      [1] NA

       

      Attachments

        Issue Links

          Activity

            People

              npr Neal Richardson
              rrex René Rex
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h