Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16132

DataSize stats don't seem correct in semijoin opt branch

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • HiveServer2
    • None

    Description

      For the following operator tree snippet, the second Select is the start of a semijoin optimization branch. Take a look at the Data size - it is the same as the data size for its parent Select, even though the second select has only a single bigint column in its projection (the parent has 2 columns). I would expect the size to be 533328 (16 bytes * 33333).
      Fixing this estimate may become important if we need to estimate the cost of generating the min/max/bloomfilter.

      Attachments

        1. HIVE-16132.1.patch
          110 kB
          Deepak Jaiswal
        2. HIVE-16132.2.patch
          120 kB
          Deepak Jaiswal
        3. HIVE-16132.3.patch
          120 kB
          Deepak Jaiswal
        4. HIVE-16132.4.patch
          120 kB
          Deepak Jaiswal
        5. HIVE-16132.5.patch
          120 kB
          Deepak Jaiswal
        6. HIVE-16132.6.patch
          120 kB
          Deepak Jaiswal

        Activity

          People

            djaiswal Deepak Jaiswal
            djaiswal Deepak Jaiswal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: