Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18288

[GO]: pqarrow (github.com/apache/arrow/go/v9/parquet/pqarrow) cannot handle arrow's DICTIONARY field

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • 9.0.0, 10.0.0
    • None
    • Go, Parquet
    • None

    Description

      Hey, Arrow Go Dev:
       
      I was trying to save some arrow tables out to parquet files, with the help of the "github.com/apache/arrow/go/v9/parquet/pqarrow" package. btw, it's generally a great design (of Arrow) and a great Go implementation. 

       
      However, one issue sticks out: in my original arrow Table I have some DICTIONARY fields, which pqarrow does NOT currently support.
       
      I would assume supporting them will be quite straightward: just "denormalize" the DICTIONARY value into corresponding values (string, Timestamp, etc), and it's up to the Parquet to do the right trick (using DICTIONARY encoding, etc). 
       
      I would have done this conversion on-the-fly by myself, by converting each DICTIONARY field into underlying values. However, the arrow table schema is dynamic and outside my control, and I need to iterate through fields (maybe structs) to locate those) -> it would be much better if pqarrow can support this natively. 
       
      Can anyone help? thanks!

      Attachments

        Activity

          People

            zeroshade Matthew Topol
            yangzh Kevin Yang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: