Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 2.0.3
    • Component/s: Tools
    • Labels:
      None

      Description

      SSTableSimple[Un]SortedWriter requires defining raw comparators and inserting raw data cells. We should create a CQL-aware alternative.

      1. 5894.txt
        29 kB
        Sylvain Lebresne
      2. 5894-v2.txt
        35 kB
        Sylvain Lebresne
      3. 5894-v3.txt
        36 kB
        Sylvain Lebresne

        Activity

        Hide
        Jeremiah Jordan added a comment -

        The biggest issue I see with doing this is how you get the CQL3 aware SSTableWriter to know what schema to use when create an sstable.

        I would think the basic interface should be newRow(<PRIMARY KEY COLUMNS>), addColumn(name, value).

        Show
        Jeremiah Jordan added a comment - The biggest issue I see with doing this is how you get the CQL3 aware SSTableWriter to know what schema to use when create an sstable. I would think the basic interface should be newRow(<PRIMARY KEY COLUMNS>), addColumn(name, value).
        Hide
        Sylvain Lebresne added a comment -

        In terms of API, what about something like that:

        String schema = "CREATE TABLE foo (c1 int, c2 text, c3 float, PRIMARY KEY (c1, c2))"
        String insert = "INSERT INTO foo(c1, c2, c3) VALUES (?, ?, ?)"
        CQLSSTableWriter writer = CQLSSTableWriter.builder()
                                                  .for(schema)
                                                  .using(insert)
                                                  .build();
        
        writer.addRow(3, "foo", 2.3f);
        writer.addRow(1, "bar", 0.0f);
        ...
        
        Show
        Sylvain Lebresne added a comment - In terms of API, what about something like that: String schema = "CREATE TABLE foo (c1 int, c2 text, c3 float, PRIMARY KEY (c1, c2))" String insert = "INSERT INTO foo(c1, c2, c3) VALUES (?, ?, ?)" CQLSSTableWriter writer = CQLSSTableWriter.builder() .for(schema) .using(insert) .build(); writer.addRow(3, "foo", 2.3f); writer.addRow(1, "bar", 0.0f); ...
        Hide
        Jonathan Ellis added a comment -

        I like it.

        Show
        Jonathan Ellis added a comment - I like it.
        Hide
        Jeremiah Jordan added a comment -

        Looks good to me. So as long as you give the right "CREATE" statement, the CQLSSTableWriter can parse the schema out and not need to talk to the cluster.

        Show
        Jeremiah Jordan added a comment - Looks good to me. So as long as you give the right "CREATE" statement, the CQLSSTableWriter can parse the schema out and not need to talk to the cluster.
        Hide
        Sylvain Lebresne added a comment -

        Attaching patch with basically the API described above. Contains a simple unit test.

        Show
        Sylvain Lebresne added a comment - Attaching patch with basically the API described above. Contains a simple unit test.
        Hide
        Vadim Chekan added a comment - - edited

        Is this code expected to work autonomous (without cassandra installed?)
        I use code from javadoc in CQLSSTableWriter (scala):

          def example1() = {
            val schema = "CREATE TABLE myKs.myTable ("+
                           " k int PRIMARY KEY,"+
                           " v1 text,"+
                           " v2 int"+
                          ")"
            val insert = "INSERT INTO myKs.myTable (k, v1, v2) VALUES (?, ?, ?)"
        
            val writer = CQLSSTableWriter.builder()
              .inDirectory("c:\\temp\\sstables_tmp")
              .forTable(schema)
              .using(insert).build()
        
            // Adds a nember of rows to the resulting sstable
            writer.addRow(int2Integer(0), "test1", int2Integer(24))
            writer.addRow(int2Integer(1), "test2", null)
            writer.addRow(int2Integer(2), "test3", int2Integer(42))
        
            // Close the writer, finalizing the sstable
            writer.close()
          }
        

        I am trying to run it on machine without cassandra node running, and I'm getting:

        Exception in thread "main" java.lang.IllegalArgumentException: Keyspace myks does not exist
        	at org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.getStatement(CQLSSTableWriter.java:397)
        	at org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.using(CQLSSTableWriter.java:328)
        	at vsw.odt.Main$.example1(Main.scala:25)
        	at vsw.odt.Main$.main(Main.scala:41)
        	at vsw.odt.Main.main(Main.scala)
        Caused by: org.apache.cassandra.db.KeyspaceNotDefinedException: Keyspace myks does not exist
        	at org.apache.cassandra.thrift.ThriftValidation.validateKeyspace(ThriftValidation.java:86)
        	at org.apache.cassandra.thrift.ThriftValidation.validateColumnFamily(ThriftValidation.java:110)
        	at org.apache.cassandra.cql3.statements.ModificationStatement$Parsed.prepare(ModificationStatement.java:575)
        	at org.apache.cassandra.cql3.statements.ModificationStatement$Parsed.prepare(ModificationStatement.java:569)
        	at org.apache.cassandra.cql3.QueryProcessor.getStatement(QueryProcessor.java:291)
        	at org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.getStatement(CQLSSTableWriter.java:386)
        	... 4 more
        
        Show
        Vadim Chekan added a comment - - edited Is this code expected to work autonomous (without cassandra installed?) I use code from javadoc in CQLSSTableWriter (scala): def example1() = { val schema = "CREATE TABLE myKs.myTable (" + " k int PRIMARY KEY," + " v1 text," + " v2 int " + ")" val insert = "INSERT INTO myKs.myTable (k, v1, v2) VALUES (?, ?, ?)" val writer = CQLSSTableWriter.builder() .inDirectory( "c:\\temp\\sstables_tmp" ) .forTable(schema) .using(insert).build() // Adds a nember of rows to the resulting sstable writer.addRow(int2Integer(0), "test1" , int2Integer(24)) writer.addRow(int2Integer(1), "test2" , null ) writer.addRow(int2Integer(2), "test3" , int2Integer(42)) // Close the writer, finalizing the sstable writer.close() } I am trying to run it on machine without cassandra node running, and I'm getting: Exception in thread "main" java.lang.IllegalArgumentException: Keyspace myks does not exist at org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.getStatement(CQLSSTableWriter.java:397) at org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.using(CQLSSTableWriter.java:328) at vsw.odt.Main$.example1(Main.scala:25) at vsw.odt.Main$.main(Main.scala:41) at vsw.odt.Main.main(Main.scala) Caused by: org.apache.cassandra.db.KeyspaceNotDefinedException: Keyspace myks does not exist at org.apache.cassandra.thrift.ThriftValidation.validateKeyspace(ThriftValidation.java:86) at org.apache.cassandra.thrift.ThriftValidation.validateColumnFamily(ThriftValidation.java:110) at org.apache.cassandra.cql3.statements.ModificationStatement$Parsed.prepare(ModificationStatement.java:575) at org.apache.cassandra.cql3.statements.ModificationStatement$Parsed.prepare(ModificationStatement.java:569) at org.apache.cassandra.cql3.QueryProcessor.getStatement(QueryProcessor.java:291) at org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.getStatement(CQLSSTableWriter.java:386) ... 4 more
        Hide
        Vadim Chekan added a comment -

        One more note while we are on it. If listen_address is empty in yaml, I get NPE. Here is the fix:

        diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
        index 12ba7dd..fd89df0 100644
        --- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
        +++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
        @@ -276,7 +276,7 @@ public class DatabaseDescriptor
                         throw new ConfigurationException("Unknown listen_address '" + conf.listen_address + "'");
                     }
                 }
        -        if (conf.listen_address.equals("0.0.0.0"))
        +        if (conf.listen_address != null && conf.listen_address.equals("0.0.0.0"))
                     throw new ConfigurationException("listen_address cannot be 0.0.0.0!");
         
                 try
        
        Show
        Vadim Chekan added a comment - One more note while we are on it. If listen_address is empty in yaml, I get NPE. Here is the fix: diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java index 12ba7dd..fd89df0 100644 --- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java +++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java @@ -276,7 +276,7 @@ public class DatabaseDescriptor throw new ConfigurationException( "Unknown listen_address '" + conf.listen_address + "'" ); } } - if (conf.listen_address.equals( "0.0.0.0" )) + if (conf.listen_address != null && conf.listen_address.equals( "0.0.0.0" )) throw new ConfigurationException( "listen_address cannot be 0.0.0.0!" ); try
        Hide
        Sylvain Lebresne added a comment -

        I had meant to test this autonomously just after having attached the initial patch but got distracted.

        Anyway, attaching v2 that fix it working outside of a unit test. I'll note that one of the problem for that (not the one that was triggering the first exception but a problem nonetheless) is that post-CASSANDRA-5515, SSTableWriter.closeAndOpenReader() requires reading the system tables which is breaking the AbstractSSTableSimpleWriter (even the existing ones that is). Attached v2 fixes this (by adding a SSTableWriter.close() that don't try to open the reader and using that in AbstractSSTableSimpleWriter), but if we're not confident about commit this patch to 2.0.2 then we'd probably still need to extract that part of the patch.

        Show
        Sylvain Lebresne added a comment - I had meant to test this autonomously just after having attached the initial patch but got distracted. Anyway, attaching v2 that fix it working outside of a unit test. I'll note that one of the problem for that (not the one that was triggering the first exception but a problem nonetheless) is that post- CASSANDRA-5515 , SSTableWriter.closeAndOpenReader() requires reading the system tables which is breaking the AbstractSSTableSimpleWriter (even the existing ones that is). Attached v2 fixes this (by adding a SSTableWriter.close() that don't try to open the reader and using that in AbstractSSTableSimpleWriter), but if we're not confident about commit this patch to 2.0.2 then we'd probably still need to extract that part of the patch.
        Hide
        Vadim Chekan added a comment -

        Great! I successfully did the following: generated 10mln rows on cassandra (current from the github + the patch); Win64; scala, copied sstables to Linux/cassandra-2.0.1 and imported with sstableloader. Generating performance was roughly 1.4x of the bulk insert via SQL3. Insert was <1minute, which is instant in comparison to generating.

        Not any kind of problem, but just a side note. When generating, I tried to raise withBufferSizeInMB on 16Gb Ram box and -Xmx9g option but could not go higher than 750Mb buffer, because GC shots itself in the head.

        Show
        Vadim Chekan added a comment - Great! I successfully did the following: generated 10mln rows on cassandra (current from the github + the patch); Win64; scala, copied sstables to Linux/cassandra-2.0.1 and imported with sstableloader. Generating performance was roughly 1.4x of the bulk insert via SQL3. Insert was <1minute, which is instant in comparison to generating. Not any kind of problem, but just a side note. When generating, I tried to raise withBufferSizeInMB on 16Gb Ram box and -Xmx9g option but could not go higher than 750Mb buffer, because GC shots itself in the head.
        Hide
        Petter von Dolwitz added a comment -

        I've been struggling with with using the SSTableSimpleUnsortedWriter to write data against a CQL3 defined table. I'm looking forward to this fix.

        Show
        Petter von Dolwitz added a comment - I've been struggling with with using the SSTableSimpleUnsortedWriter to write data against a CQL3 defined table. I'm looking forward to this fix.
        Hide
        Jonathan Ellis added a comment -

        LGTM overall.

        Nits:

        Not a fan of the table of type correspondences; I see this getting out of date as we add types. Isn't there one for e.g. Java Driver docs we can put a link to instead?

        For CqlOutputFormat we use a Map<column name, value> which seems a little better than List<object> to me. Specifically, if the loader is adding in data streamed from another source, it won't have to buffer and re-order columns by name manually prior to addRow.

        (I suppose we could have both, with the Map version turning it into a List automagically.)

        Show
        Jonathan Ellis added a comment - LGTM overall. Nits: Not a fan of the table of type correspondences; I see this getting out of date as we add types. Isn't there one for e.g. Java Driver docs we can put a link to instead? For CqlOutputFormat we use a Map<column name, value> which seems a little better than List<object> to me. Specifically, if the loader is adding in data streamed from another source, it won't have to buffer and re-order columns by name manually prior to addRow. (I suppose we could have both, with the Map version turning it into a List automagically.)
        Hide
        Sylvain Lebresne added a comment -

        Isn't there one for e.g. Java Driver docs we can put a link to instead?

        There is, the one I copy-pasted from . But yeah, we can link to that, done in v3 (not that I'm entirely convinced that this is much protection against getting out of date).

        I suppose we could have both, with the Map version turning it into a List automagically

        Did that in v3.

        Show
        Sylvain Lebresne added a comment - Isn't there one for e.g. Java Driver docs we can put a link to instead? There is, the one I copy-pasted from . But yeah, we can link to that, done in v3 (not that I'm entirely convinced that this is much protection against getting out of date). I suppose we could have both, with the Map version turning it into a List automagically Did that in v3.
        Hide
        Jonathan Ellis added a comment -

        +1

        Show
        Jonathan Ellis added a comment - +1
        Hide
        Sylvain Lebresne added a comment -

        Committed, thanks

        Show
        Sylvain Lebresne added a comment - Committed, thanks

          People

          • Assignee:
            Sylvain Lebresne
            Reporter:
            Jonathan Ellis
            Reviewer:
            Jonathan Ellis
          • Votes:
            4 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development