Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-5226

CQL3 refactor to allow conversion function

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 1.2.2
    • None
    • None

    Description

      In CASSANDRA-5198, we've fixed CQL3 type validation and talked about adding conversion functions to ease working with the different data types. However, the current CQL3 code makes it fairly hard to add such functions in a non-hacky way. In fact, we already support a few conversion functions (token, minTimeuuid, maxTimeuuid, now) but the way we support them is extremely ugly (the token function is competely special cased and min/maxTimeuuid are ugly hacks in TimeUUIDType that I'm really not proud of).

      So I'm attaching a refactor that cleans that up, making it easy to add new conversion functions. Now, said refactor is a big one. While the goal is to make it easy to add functions, I think it also improve the code in the following ways:

      • It much more clearly separate the phase of validating the query from executing it. And in particular, it moves more work in the preparing phase. Typically, the parsing of constants is now done in the preparing phase, not the execution one. It also groups validation code much more cleanly imo.
      • It simplify UpdateStatement. The Operation business was not very clean and in particular the same operations where not handled by the same code depending on whether they were prepared or not, which was error prone. This is no longer the case.
      • It somewhat simplify the parser. A few parsing rules were a bit too convoluted, trying to enforce invariants that are much more easily checked post parsing (and doing it post parsing often allow better error messages, the parser tends to send cryptic errors).

      The first attached part is the initial refactor. It also adds some relatively generic code for adding conversion functions (it would typically not be very hard to allow user defined functions, though that's not part of the patch at all) and uses that to handle the existing token, minTimeuuid and maxTimeuuid functions.

      It's also worth mentioning that this first patch introduces type casts. The main reason is that it allows multiple overloads of the same function. Typically, the minTimeuuid allows both strings arguments (for dates) or integer ones (for timestamp), but so when you have:

      SELECT * FROM foo WHERE t > minTimeuuid(?);
      

      then the code doesn't know which function to use. So it complains. But you can remove the ambiguity with

      SELECT * FROM foo WHERE t > minTimeuuid((bigint)?);
      

      The 2nd patch finishes what the first one started by extending this conversion functions support to select clauses. So after this 2nd patch you can do stuff like:

      SELECT token(k), k FROM foo;
      

      for instance.

      The 3rd patch builds on that to actually add new conversion functions. Namely, for every existing CQL3 <type> it adds a blobTo<type> and a <type>ToBlob function that convert from and to blobs. And so you can do (not that this example is particularly smart):

      SELECT varintToBlob(v) FROM foo WHERE v > blobToVarint(bigintToBlob(3));
      

      Honestly this last patch is more for demonstration purpose and we can discuss this separately. In particular, we may want better different names for those functions. But at least it should highlight that adding new function is easy (this could be used to add methods to work with dates for instance).

      Now, at least considering the 2 first patches, this is not a small amount of code but I would still suggest pushing this in 1.2 (the patches are against 1.2) for the following reasons:

      • It fixes a few existing bugs (CASSANDRA-5198 has broke prepared statement for instance, which this patch fixes) and add missing validation in a few places (we are allowing sets literals like { 1, 1 } for instance, which is kind of wrong as it suggests we support multisets). We could fix those separatly but honestly I'm not we won't miss some.
      • We do have a fair amount of CQL dtests and i've check all pass. The refactor also cleans up some part of the code quite a bit imo. So overall I think I'm almost more confident in the code post-refactor than the current one.
      • We're early in 1.2 and it's an improvement after all. It would be a bit sad to have to wait for 2.0 to get this.

      Attachments

        1. 0003-Add-bytes-conversion-functions.txt
          4 kB
          Sylvain Lebresne
        2. 0002-Allow-functions-in-selection.txt
          50 kB
          Sylvain Lebresne
        3. 0001-Refactor-to-support-CQL3-functions.txt
          257 kB
          Sylvain Lebresne

        Activity

          People

            slebresne Sylvain Lebresne
            slebresne Sylvain Lebresne
            Sylvain Lebresne
            Aleksey Yeschenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: