Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5085

[Python/C++] Conversion of dict encoded null column fails in parquet writing when using RowGroups

    XMLWordPrintableJSON

Details

    Description

      Conversion of dict encoded null column fails in parquet writing when using RowGroups

      import pyarrow.parquet as pq
      import pandas as pd
      import pyarrow as pa
      df = pd.DataFrame({"col": [None] * 100, "int": [1.0] * 100})
      df = df.astype({"col": "category"})
      table = pa.Table.from_pandas(df)
      buf = pa.BufferOutputStream()
      pq.write_table(
          table,
          buf,
          version="2.0",
          chunk_size=10,
      )
      

      fails with

      pyarrow.lib.ArrowIOError: Column 2 had 100 while previous column had 10

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              fjetter Florian Jetter
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m