Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.17.0
Description
Currently, I try to export numeric data plus some metadata in Python into to a parquet file and read it in R. However, the metadata seems to be a dict in Python but a string in R. I would have expected a list (which is roughly a dict in Python). Am I missing something? Here is the code to demonstrate the issue:
import sys
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq
print(sys.version)
print(pa._version_)
x = np.random.randint(0, 10, (10, 3))
arrays = [pa.array(x[:, i]) for i in range(x.shape[1])]
table = pa.Table.from_arrays(arrays=arrays, names=['A', 'B', 'C'],
{{ metadata={'foo': '42'})}}
pq.write_table(table, 'array.parquet', compression='snappy')
table = pq.read_table('array.parquet')
metadata = table.schema.metadata
print(metadata)
print(type(metadata))
And in R:
library(arrow)
print(R.version)
print(packageVersion("arrow"))
table <- read_parquet("array.parquet", as_data_frame = FALSE)
metadata <- table$schema$metadata
print(metadata)
print(is(metadata))
print(metadata["foo"])
Output Python:
{{3.6.8 (default, Aug 7 2019, 17:28:10) }}
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
0.13.0
OrderedDict([(b'foo', b'42')])
<class 'collections.OrderedDict'>
Output R:
[1] ‘0.17.0’
[1] "\n-- metadata --\nfoo: 42"
[1] "character" "vector" "data.frameRowLabels"
{{[4] "SuperClassMethod" }}
[1] NA
Attachments
Issue Links
- links to