Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5454

HCatalog runs a partition listing with an empty filter

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.13.0
    • Component/s: HCatalog
    • Labels:
      None
    • Release Note:
      Deprecated the HCatInputFormat#setFilter(…) chain API call in favor of a new, filter-passing, HCatInputFormat#setInput(…) method.

      Description

      This is a HCATALOG-527 caused regression, wherein the HCatLoader's way of calling HCatInputFormat causes it to do 2x partition lookups - once without the filter, and then again with the filter.

      For tables with large number partitions (100000, say), the non-filter lookup proves fatal both to the client ("Read timed out" errors from ThriftMetaStoreClient cause the server doesn't respond) and to the server (too much data loaded into the cache, OOME, or slowdown).

      The fix would be to use a single call that also passes a partition filter information, as was in the case of HCatalog 0.4 sources before HCATALOG-527.

      (HCatalog-release-wise, this affects all 0.5.x users)

        Attachments

        1. D13317.1.patch
          6 kB
          Phabricator
        2. D13317.2.patch
          29 kB
          Phabricator
        3. D13317.3.patch
          29 kB
          Phabricator

          Activity

            People

            • Assignee:
              qwertymaniac Harsh J
              Reporter:
              qwertymaniac Harsh J
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: