Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14573

Vectorization: Implement StringExpr::find()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • None
    • None
    • Vectorization: Implement StringExpr::find() (Teddy Choi, reviewed by Gopal V)

    Description

      Currently, the LIKE expression implementation is a dumb StringExpr::equals() loop.

      For an input of N bytes and a pattern of M bytes, this has the complexity of ((N-M)*M), which is not an issue with small patterns or small inputs.

      The pattern matching is currently optimized for matches, while in clickstream data the opposite is true in general.

      From the common crawl data, the following run will go through the same

      select count(1) from uservisits_orc_data where useragent like "%Opera%" and searchword LIKE "%fruit%";
      

      Attachments

        1. HIVE-15743.1.patch
          11 kB
          Teddy Choi
        2. HIVE-15743.2.patch
          14 kB
          Teddy Choi
        3. HIVE-14573.2.patch
          14 kB
          Teddy Choi

        Issue Links

          Activity

            People

              teddy.choi Teddy Choi
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: