Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15233

pyarrow.dataset.dataset loses type information when reading parquet files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Incomplete
    • 6.0.1
    • None
    • Python
    • None
    • Ubuntu 20.2, Python 3.8.10

    Description

      When reading a parquet containing time data:

       

      >>> import pyarrow.dataset
      >>> ds = pyarrow.dataset.dataset('foo.parquet', format='parquet')
      >>> ds.schema[1].type
      DataType(time32[ms])

      I get DataType rather than Time32Type, which means I can't query time units.

      I assume this is an issue for other non-basic types.

      I can write code to scrape the type's string representation. Is there a better way?

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            j1m Jim Fulton
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: