Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10226

[Rust] [Parquet] Parquet reader reading wrong columns in some batches within a parquet file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • Rust, Rust - DataFusion
    • None

    Description

      I re-installed my desktop a few days ago (now using Ubuntu 20.04 LTS)  and when I try and run the TPC-H benchmark, it never completes and eventually uses up all 64 GB RAM.

      I can run Spark against the data  set and the query completes in 24 seconds, which IIRC is how long it took before.

      It is possible that something is odd on my environment, but it is also possible/likely that this is a real bug.

      I am investigating this and will update the Jira once I know more.

      I also went back to old commits that were working for me before and they show the same issue so I don't think this is related to a recent code change.

      Attachments

        Activity

          People

            andygrove Andy Grove
            andygrove Andy Grove
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: