Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10999

[Rust] TPC-H parquet files cannot be read by Apache Spark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • Rust

    Description

      The TPC-H parquet files generated by the benchmark crate cannot be read by Apache Spark because they use unsigned ints, which cannot be read in Spark (I am guessing because Java only has signed ints).

      I would  like to use the same data sets for benchmarking DataFusion, Apache Spark, and other tools.

      Attachments

        Issue Links

          Activity

            People

              andygrove Andy Grove
              andygrove Andy Grove
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h