Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15398

change metadata-only queries to still read the original table (in some cases?)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      See HIVE-15397.
      There are multiple complementary ways to handle this properly:
      1) Enhance MetadataOnly to recognize when table emptiness matters and only optimize safe query patterns (or only use the below in unsafe cases).
      2) Create the original IF inside compilation, get record reader and see if it's empty. Seems like the only bulletproof method in terms of correctness, but it may break due to difference in setup and access between tasks and compilation. May also have security implications e.g. if compilation is in HS2 and permissions are different from tasks.
      3) Instead of using NullIF for this case, somehow inject limit into table scan (using limit in the plan, or just hack it into TS itself specifically for this feature), and keep the original InputFormat. That way instead of 0 or 1 null rows it would return 0 or 1 rows from the original split, while avoiding large scans, which is the goal.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sershe Sergey Shelukhin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: