Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10426

[C++] Arrow type large_string cannot be written to Parquet type column descriptor

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.0
    • 3.0.0
    • C++, R
    • R 4.0.3 on OSX 10.15.7

    Description

      When trying to write a dataset in parquet format, arrow errors with the message: "Arrow type large_string cannot be written to Parquet type column descriptor"

      arrow::write_dataset(
       dataframe,
       "/directory/",
       "parquet",
       "partitioning" = c("col1", "col2")
      )
      

      The dataframe in question is very large with one column containing the text of message board posts encoded in HTML.

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              GabeTheEngineer Gabriel Bassett
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h