Cassandra
  1. Cassandra
  2. CASSANDRA-2530

Additional AbstractType data type definitions to enrich CQL

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Fix Version/s: 0.8.1
    • Component/s: Core
    • Labels:

      Description

      Provide 5 additional Datatypes: ByteType, DateType, BooleanType, FloatType, DoubleType.

        Activity

        Hide
        Hudson added a comment -

        Integrated in Cassandra-0.8 #178 (See https://builds.apache.org/job/Cassandra-0.8/178/)

        Show
        Hudson added a comment - Integrated in Cassandra-0.8 #178 (See https://builds.apache.org/job/Cassandra-0.8/178/ )
        Hide
        Jonathan Ellis added a comment -

        committed, thanks!

        Show
        Jonathan Ellis added a comment - committed, thanks!
        Hide
        Rick Shaw added a comment -

        My fault. I misunderstood the request. I removed extraneous patches. I added new patch that is rebased to the current trunk.

        Show
        Rick Shaw added a comment - My fault. I misunderstood the request. I removed extraneous patches. I added new patch that is rebased to the current trunk.
        Hide
        Jonathan Ellis added a comment -

        Still getting patch failures against 0.8:

        form:svn-0.8 jonathan$ patch -p0 < patch-to-add-4-new-AbstractTypes-and-CQL-support-v5.txt 
        patching file src/java/org/apache/cassandra/cql/Cql.g
        Hunk #1 succeeded at 296 with fuzz 2 (offset 5 lines).
        Hunk #2 FAILED at 397.
        1 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/cassandra/cql/Cql.g.rej
        patching file src/java/org/apache/cassandra/cql/CreateColumnFamilyStatement.java
        Hunk #1 FAILED at 71.
        1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/cassandra/cql/CreateColumnFamilyStatement.java.rej
        patching file src/java/org/apache/cassandra/cql/Term.java
        patching file src/java/org/apache/cassandra/db/marshal/BooleanType.java
        patching file src/java/org/apache/cassandra/db/marshal/DateType.java
        patching file src/java/org/apache/cassandra/db/marshal/DoubleType.java
        patching file src/java/org/apache/cassandra/db/marshal/FloatType.java
        
        Show
        Jonathan Ellis added a comment - Still getting patch failures against 0.8: form:svn-0.8 jonathan$ patch -p0 < patch-to-add-4-new-AbstractTypes-and-CQL-support-v5.txt patching file src/java/org/apache/cassandra/cql/Cql.g Hunk #1 succeeded at 296 with fuzz 2 (offset 5 lines). Hunk #2 FAILED at 397. 1 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/cassandra/cql/Cql.g.rej patching file src/java/org/apache/cassandra/cql/CreateColumnFamilyStatement.java Hunk #1 FAILED at 71. 1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/cassandra/cql/CreateColumnFamilyStatement.java.rej patching file src/java/org/apache/cassandra/cql/Term.java patching file src/java/org/apache/cassandra/db/marshal/BooleanType.java patching file src/java/org/apache/cassandra/db/marshal/DateType.java patching file src/java/org/apache/cassandra/db/marshal/DoubleType.java patching file src/java/org/apache/cassandra/db/marshal/FloatType.java
        Hide
        Rick Shaw added a comment -

        Will do. Sorry I am a bit new to the patching business.

        Show
        Rick Shaw added a comment - Will do. Sorry I am a bit new to the patching business.
        Hide
        Jonathan Ellis added a comment -

        can you rebase the Cql.g and CCFS.java conflicts?

        Show
        Jonathan Ellis added a comment - can you rebase the Cql.g and CCFS.java conflicts?
        Show
        Jonathan Ellis added a comment - (I meant Date.) I did some more reading and it looks like what you propose is mainstream behavior for JDBC drivers. ( http://stackoverflow.com/questions/4078426/to-which-java-data-types-can-i-map-timestamp-with-time-zone-or-timestamp-with-loc , http://postgresql.1045698.n5.nabble.com/Timestamp-confusion-td2174087.html , http://download.oracle.com/docs/cd/E13222_01/wls/docs81/jdbc_drivers/oracle.html ). I'll get this committed.
        Hide
        Jonathan Ellis added a comment -

        Timestamp is not less broken than it was then, but we could commit the others.

        Show
        Jonathan Ellis added a comment - Timestamp is not less broken than it was then, but we could commit the others.
        Hide
        Rick Shaw added a comment -

        This appeared to get the ok on 2011-06-01? I there more work required to get this in trunk for 0.8.2?

        Show
        Rick Shaw added a comment - This appeared to get the ok on 2011-06-01? I there more work required to get this in trunk for 0.8.2?
        Hide
        Rick Shaw added a comment -

        I see your point. What is being lost is the fact that the writer intends the date to be interpreted in EST for all subsequent readers. But that is not recorded by any binary data type: j.u.Date,j.s.Date,j.s.Time, j.s.Timestamp, or even j.u.UUID. Really it is only String/text that can convey that intent of the original writer.

        The SQL Timestamp does not really help in that area. It offers the addition of more (nanos) accuracy but not "timestamp with timezone" features unless I read it terribly wrong??

        Show
        Rick Shaw added a comment - I see your point. What is being lost is the fact that the writer intends the date to be interpreted in EST for all subsequent readers. But that is not recorded by any binary data type: j.u.Date,j.s.Date,j.s.Time, j.s.Timestamp, or even j.u.UUID. Really it is only String/text that can convey that intent of the original writer. The SQL Timestamp does not really help in that area. It offers the addition of more (nanos) accuracy but not "timestamp with timezone" features unless I read it terribly wrong??
        Hide
        Jonathan Ellis added a comment -

        I don't think any info is lost.

        If I store a date in EST I want to get it back in EST, not the default. Hence the SQL "timestamp with time zone" type.

        Show
        Jonathan Ellis added a comment - I don't think any info is lost. If I store a date in EST I want to get it back in EST, not the default. Hence the SQL "timestamp with time zone" type.
        Hide
        Rick Shaw added a comment - - edited

        Isn't date throwing away information when it encodes to long (the time zone)?

        I don't think any info is lost. The encoding adjusts the date to UTC based on the (default) locale. The formatting back out again will treat it as stored in UTC and adjust it back to the (default) locale. Arguably fragile. But using it in values (as opposed to names) is the JDBC assumption anyway. Saving 8 bytes as the volumes of rows and columns it is used in, goes into the stratosphere are very compelling.

        Also not a fan of copy/pasting the iso patterns, go ahead and reference the copy in TUUIDT, or move those somewhere more generic.

        Agreed. Patch shares the value in TUUIDT as a import static. Moving the declaration to say: AbstractType may be better in the long run but the newbie did not want to tamper any more than necessary.

        Using compareUnsigned in the boolean compare method seems like overkill.

        Agreed. A bit too much "sharing"... Patch uses o1.compareTo(o2).

        Show
        Rick Shaw added a comment - - edited Isn't date throwing away information when it encodes to long (the time zone)? I don't think any info is lost. The encoding adjusts the date to UTC based on the (default) locale. The formatting back out again will treat it as stored in UTC and adjust it back to the (default) locale. Arguably fragile. But using it in values (as opposed to names) is the JDBC assumption anyway. Saving 8 bytes as the volumes of rows and columns it is used in, goes into the stratosphere are very compelling. Also not a fan of copy/pasting the iso patterns, go ahead and reference the copy in TUUIDT, or move those somewhere more generic. Agreed. Patch shares the value in TUUIDT as a import static . Moving the declaration to say: AbstractType may be better in the long run but the newbie did not want to tamper any more than necessary. Using compareUnsigned in the boolean compare method seems like overkill. Agreed. A bit too much "sharing"... Patch uses o1.compareTo(o2) .
        Hide
        Pavel Yaskevich added a comment -

        +1 on changes in grammar.

        Show
        Pavel Yaskevich added a comment - +1 on changes in grammar.
        Hide
        Jonathan Ellis added a comment -

        Isn't date throwing away information when it encodes to long (the time zone)?

        Also not a fan of copy/pasting the iso patterns, go ahead and reference the copy in TUUIDT, or move those somewhere more generic.

        Using compareUnsigned in the boolean compare method seems like overkill.

        Otherwise looks fine to me.

        Show
        Jonathan Ellis added a comment - Isn't date throwing away information when it encodes to long (the time zone)? Also not a fan of copy/pasting the iso patterns, go ahead and reference the copy in TUUIDT, or move those somewhere more generic. Using compareUnsigned in the boolean compare method seems like overkill. Otherwise looks fine to me.
        Hide
        Rick Shaw added a comment -

        The CQL mods now support declaring the following datatypes in CQL:

        date DateType
        boolean BooleanType
        float FloatType
        double DoubleType

        So you can say:

        CREATE COLUMNFAMILY TestCF
         (KEY text PRIMARY KEY,
           stamp date,
           description text,
           hasAnAccount boolean,
           floating float,
           anumber varint) 
         WITH comparator = ascii 
          AND comment = 'Test CF' 
          AND default_validation = bigint
          AND memtable_flush_after_mins = 2
         ;
        
        
        UPDATE TestCF USING CONSISTENCY ONE 
          SET
              description = 'Some Random Text',
              anumber = 3,
              stamp = '2011-01-13 04:30:01',
              hasAnAccount = true,
              floating = 150.234
          WHERE KEY = 2000;
        
        
        Show
        Rick Shaw added a comment - The CQL mods now support declaring the following datatypes in CQL: date DateType boolean BooleanType float FloatType double DoubleType So you can say: CREATE COLUMNFAMILY TestCF (KEY text PRIMARY KEY, stamp date, description text, hasAnAccount boolean , floating float , anumber varint) WITH comparator = ascii AND comment = 'Test CF' AND default_validation = bigint AND memtable_flush_after_mins = 2 ; UPDATE TestCF USING CONSISTENCY ONE SET description = 'Some Random Text', anumber = 3, stamp = '2011-01-13 04:30:01', hasAnAccount = true , floating = 150.234 WHERE KEY = 2000;
        Hide
        Rick Shaw added a comment -

        Oops... meant ByteType.

        Show
        Rick Shaw added a comment - Oops... meant ByteType .
        Hide
        Rick Shaw added a comment -
        • rebased to RC1
        • Dropped BytesType
        • Added CQL modifications for new data types
        Show
        Rick Shaw added a comment - rebased to RC1 Dropped BytesType Added CQL modifications for new data types
        Hide
        Sylvain Lebresne added a comment -

        My reasoning was more about a guaranteed length that was short. We plan to put ENUMs into them in prepared statements. Again, selfishly, if we have a lot of Float usage we really have a lot of numbers between 1-10 to store! So all these volumes above are just more exaggerated in our need to store lots of short numbers in an optimal width. But I do understand the variable integer IntegerType could be used.

        It think what you want to use is a BytesType with only one byte each time. The only thing that you'll miss is that validate won't ensure you only have 1 byte long inputs. However, for that kind of thing, I would be in favor of using CASSANDRA-2355 and enriching BytesType so that you can write stuff like BytesType(max_length=1). That's more flexible and avoid adding tons of new classes.

        Show
        Sylvain Lebresne added a comment - My reasoning was more about a guaranteed length that was short. We plan to put ENUMs into them in prepared statements. Again, selfishly, if we have a lot of Float usage we really have a lot of numbers between 1-10 to store! So all these volumes above are just more exaggerated in our need to store lots of short numbers in an optimal width. But I do understand the variable integer IntegerType could be used. It think what you want to use is a BytesType with only one byte each time. The only thing that you'll miss is that validate won't ensure you only have 1 byte long inputs. However, for that kind of thing, I would be in favor of using CASSANDRA-2355 and enriching BytesType so that you can write stuff like BytesType(max_length=1). That's more flexible and avoid adding tons of new classes.
        Hide
        Rick Shaw added a comment -

        I'm slightly against having both Float and Double – I'd like to discourage premature optimization and using 4 bytes for floating point instead of 8 is almost always the wrong space:accuracy tradeoff today.

        I guess I added Float and Double together because they were so similar it seemed silly not to. It is also a JDBC type and seemed easy enough to provide and make implementing the next stages of JDBC easier.

        Selfishly, my implementations use a lot of small float numbers like: "1.2" or "66.667" so a row might have 50 columns of such data and 5-50 million rows. So a selfish trend on my part is to provide the smallest representation on bytes stored/in-memory.

        Similarly not sure there is a raison d'être for ByteType.

        My reasoning was more about a guaranteed length that was short. We plan to put ENUMs into them in prepared statements. Again, selfishly, if we have a lot of Float usage we really have a lot of numbers between 1-10 to store! So all these volumes above are just more exaggerated in our need to store lots of short numbers in an optimal width. But I do understand the variable integer IntegerType could be used.

        This looks wrong:

        Yep... Cut-n-paste in a hurry... will get you every time! I'll fix with v3 patch. (starting to see a trend)

        Yes. I'll look into the error of my "whitespace" ways.

        I will try to conjure some tests that dovetail into the existing testing framework.

        A separate ticket would probably be good for ANTLR work. I'll give it a shot but am about 30 years rusty on parsing grammars. But I'll gladly give it a try.

        Show
        Rick Shaw added a comment - I'm slightly against having both Float and Double – I'd like to discourage premature optimization and using 4 bytes for floating point instead of 8 is almost always the wrong space:accuracy tradeoff today. I guess I added Float and Double together because they were so similar it seemed silly not to. It is also a JDBC type and seemed easy enough to provide and make implementing the next stages of JDBC easier. Selfishly, my implementations use a lot of small float numbers like: "1.2" or "66.667" so a row might have 50 columns of such data and 5-50 million rows. So a selfish trend on my part is to provide the smallest representation on bytes stored/in-memory. Similarly not sure there is a raison d'être for ByteType. My reasoning was more about a guaranteed length that was short. We plan to put ENUMs into them in prepared statements. Again, selfishly, if we have a lot of Float usage we really have a lot of numbers between 1-10 to store! So all these volumes above are just more exaggerated in our need to store lots of short numbers in an optimal width. But I do understand the variable integer IntegerType could be used. This looks wrong: Yep... Cut-n-paste in a hurry... will get you every time! I'll fix with v3 patch. (starting to see a trend) Yes. I'll look into the error of my "whitespace" ways. I will try to conjure some tests that dovetail into the existing testing framework. A separate ticket would probably be good for ANTLR work. I'll give it a shot but am about 30 years rusty on parsing grammars. But I'll gladly give it a try.
        Hide
        Rick Shaw added a comment -

        Sorry... Things changed out from under me!

        Show
        Rick Shaw added a comment - Sorry... Things changed out from under me!
        Hide
        Jonathan Ellis added a comment -

        Thanks for the patch!

        I'm slightly against having both Float and Double – I'd like to discourage premature optimization and using 4 bytes for floating point instead of 8 is almost always the wrong space:accuracy tradeoff today.

        Similarly not sure there is a raison d'être for ByteType.

        This looks wrong:

        +        if (bytes.remaining() != 4)
        +        {
        +            throw new MarshalException("A double is exactly 8 bytes) : "+bytes.remaining());
        +        }
        

        Please check use of whitespace w/ http://wiki.apache.org/cassandra/CodeStyle.

        Would like to see new Types covered by RoundTripTest.

        Would like to see new Types integrated into CQL; this does involve digging into the Antlr code though (to allow it to recognize new literal types). So if you want we can open a separate ticket for that.

        Show
        Jonathan Ellis added a comment - Thanks for the patch! I'm slightly against having both Float and Double – I'd like to discourage premature optimization and using 4 bytes for floating point instead of 8 is almost always the wrong space:accuracy tradeoff today. Similarly not sure there is a raison d'être for ByteType. This looks wrong: + if (bytes.remaining() != 4) + { + throw new MarshalException("A double is exactly 8 bytes) : "+bytes.remaining()); + } Please check use of whitespace w/ http://wiki.apache.org/cassandra/CodeStyle . Would like to see new Types covered by RoundTripTest. Would like to see new Types integrated into CQL; this does involve digging into the Antlr code though (to allow it to recognize new literal types). So if you want we can open a separate ticket for that.
        Hide
        Jeremy Hanna added a comment -

        If you're working against the current 0.8 branch, you'll see that you need decompose on the AbstractType as well - just going the other direction. Thanks for putting a patch together!

        Show
        Jeremy Hanna added a comment - If you're working against the current 0.8 branch, you'll see that you need decompose on the AbstractType as well - just going the other direction. Thanks for putting a patch together!

          People

          • Assignee:
            Rick Shaw
            Reporter:
            Rick Shaw
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development