Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4861

Revisit the 'entries' stored as part of ParquetGroupScan

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.7.0
    • None
    • Storage - Parquet
    • None

    Description

      The ParquetGroupScan stores a list of ReadEntryWithPath in the form of 'entries' field as well as a hash set of file names in the 'fileSet' field.
      The underlying data stored by both is essentially the same set of filenames. We should try to consolidate these into a single entity. This is not just useful for code simplification but has a real performance cost: when a ParquetGroupScan is serialized and sent as part of a Json plan fragment, the overhead is quite high if the number of files is large (tens of thousands or higher).

      Attachments

        Activity

          People

            Unassigned Unassigned
            amansinha100 Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: