Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5931

Don't synthesize block metadata in the catalog for S3/ADLS

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • Impala 2.13.0, Impala 3.1.0
    • Catalog
    • None

    Description

      Today, the catalog synthesizes block metadata for S3/ADLS by just breaking up splittable files into "blocks" with the FileSystem's default block size. Rather than carrying these blocks around in the catalog and distributing them to all impalad's, we might as well generate the scan ranges on-the-fly during planning. That would save the memory and network bandwidth of blocks.

      That does mean that the planner will have to instantiate and call the filesystem to get the default block size, but for these FileSystem's, that's just a matter of reading the config.

      Perhaps the same can be done for HDFS erasure coding, though that depends on what a block location actually means in that context and whether they contain useful info.

      Attachments

        Issue Links

          Activity

            People

              vukercegovac Vuk Ercegovac
              dhecht Daniel Hecht
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: