Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10134

[C++][Dataset] Add ParquetFileFragment::num_row_groups property

    XMLWordPrintableJSON

Details

    Description

      From https://github.com/dask/dask/pull/6534#issuecomment-699512602, comment by rjzamora:

      it would be great to have access the total row-group count for the fragment from a num_row_groups attribute (which pyarrow should be able to get without parsing all row-group metadata/statistics - I think?).

      One question is: does this attribute correspond to the row groups in the parquet file, or the (subset of) row groups represented by the fragment?
      I expect the second (so if you do SplitByRowGroup, you would get a fragment with num_row_groups==1), but this might be a potential confusing aspect of the attribute.

      Attachments

        Issue Links

          Activity

            People

              bkietz Ben Kietzman
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m