Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-5644

Add CONTAINS_SUBSTR function (enabled for BigQuery)

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.36.0
    • None

    Description

      BigQuery offers the CONTAINS_SUBSTR function that returns TRUE if a substring is present in an expression and FALSE if it is not. A basic example of this may be seen in [1].

      The expression can take many forms (more info in the linked doc) which makes its implementation more complex than other string functions that only accept arguments of type STRING. For instance, the expression to be searched can be a column or table reference.

      The function also has an optional third argument called json_scope where the user can indicate the scope of JSON data (keys, values, or both) to be searched for the substring.

      I am curious if anyone has thoughts on how the search of rows or tables could be implemented. I have a basic implementation (that supports expressions of type STRING and nothing else) that I will open a draft PR for as a starting point. To me, the challenge is implementing the additional features like the JSON_SCOPE argument (seen in [2]) and performing a cross field search as seen in [3].

      [1] SELECT CONTAINS_SUBSTR("hello", "he"); would return TRUE.

      [2] SELECT CONTAINS_SUBSTR(JSON '("lunch":"soup")', "lunch",
      json_scope=>"JSON_VALUES") AS result;
      would return FALSE.

      [3]
      {{SELECT *
      FROM Recipes
      WHERE CONTAINS_SUBSTR(
      (SELECT AS STRUCT Recipes.* EXCEPT (Lunch, Dinner)),
      'potato'
      );}} would return:

      Breakfast Lunch Dinner
      Potato pancakes Toasted cheese sandwich Beef stroganoff

      Attachments

        Issue Links

          Activity

            People

              tanclary Tanner Clary
              tanclary Tanner Clary
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m