Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17423

LLAP Parquet caching - support file ID in splits

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      To get LLAP cache data one needs a file ID which is either an HDFS inode ID, or a composite of path, modification time and size. These can be embedded into splits for ORC, cause in particular for the former it's possible to get the IDs as a part of a normal file enumeration that split generation performs anyway.
      If they are missing, the IDs need to be obtained for every file on the fragment side.
      We should explore adding file IDs to Parquet splits when the cache is enabled.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sershe Sergey Shelukhin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: