Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-19030

Vector Quickstart Documentation does not work

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 5.0-beta1, 5.0, 5.1
    • Documentation
    • None

    Description

      The Documentation here https://cassandra.apache.org/doc/latest/cassandra/getting-started/vector-search-quickstart.html

      doesn't work.

      Some example errors, when creating the comments_vs table

      instaclustr@cqlsh:cycling> CREATE TABLE IF NOT EXISTS cycling.comments_vs (
                 ...   record_id timeuuid,
                 ...   id uuid,
                 ...   commenter text,
                 ...   comment text,
                 ...   comment_vector VECTOR <FLOAT, 5>;
      SyntaxException: line 6:34 mismatched input ';' expecting ')' (...comment_vector VECTOR <FLOAT, 5>[;])
      instaclustr@cqlsh:cycling>   created_at timestamp,
                 ...   PRIMARY KEY (id, created_at)
                 ... )
                 ... WITH CLUSTERING ORDER BY (created_at DESC);
      SyntaxException: line 1:0 no viable alternative at input 'created_at' ([created_at]...)
      instaclustr@cqlsh:cycling> 

      Which then breaks all the subsequent commands, some of the later inserts and SELECTS need work even after repairing.

      There's a few errors in the CQL commands and table definitions, I managed to get it working in the below CQL.

      CREATE KEYSPACE IF NOT EXISTS demo
         WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' };
         
      CREATE TABLE IF NOT EXISTS demo.comments_vs (
        record_id timeuuid,
        id uuid,
        commenter text,
        comment text,
        comment_vector VECTOR <FLOAT, 5>,
        created_at timestamp,
        PRIMARY KEY (id, created_at)
      );
      WITH CLUSTERING ORDER BY (created_at DESC);CREATE INDEX IF NOT EXISTS ann_index
        ON demo.comments_vs(comment_vector) USING 'sai';
        
        
      INSERT INTO demo.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
         VALUES (
            now(),
            e7ae5cf3-d358-4d99-b900-85902fda9bb0,
            '2017-03-21 13:11:09.999-0800',
            'Second rest stop was out of water',
            'Alex',
            [0.99, 0.5, 0.99, 0.1, 0.34]
      );
      
      INSERT INTO demo.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
         VALUES (
            now(),
            e7ae5cf3-d358-4d99-b900-85902fda9bb0,
            '2017-04-01 06:33:02.16-0800',
            'LATE RIDERS SHOULD NOT DELAY THE START',
            'Alex',
            [0.9, 0.54, 0.12, 0.1, 0.95]
      );
      
      INSERT INTO demo.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
         VALUES (
            now(),
            c7fceba0-c141-4207-9494-a29f9809de6f,
            totimestamp(now()),
            'The gift certificate for winning was the best',
            'Amy',
            [0.13, 0.8, 0.35, 0.17, 0.03]
      );
      
      INSERT INTO demo.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
         VALUES (
            now(),
            c7fceba0-c141-4207-9494-a29f9809de6f,
            '2017-02-17 12:43:20.234+0400',
            'Glad you ran the race in the rain',
            'Amy',
            [0.3, 0.34, 0.2, 0.78, 0.25]
      );
      
      INSERT INTO demo.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
         VALUES (
            now(),
            c7fceba0-c141-4207-9494-a29f9809de6f,
            '2017-03-22 5:16:59.001+0400',
            'Great snacks at all reststops',
            'Amy',
            [0.1, 0.4, 0.1, 0.52, 0.09]
      );
      
      INSERT INTO demo.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
         VALUES (
            now(),
            c7fceba0-c141-4207-9494-a29f9809de6f,
            '2017-04-01 17:43:08.030+0400',
            'Last climb was a killer',
            'Amy',
            [0.3, 0.75, 0.2, 0.2, 0.5]
      );
      
      SELECT * FROM demo.comments_vs
          ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
          LIMIT 3;
          
      SELECT comment, similarity_cosine(comment_vector, [0.2, 0.15, 0.3, 0.2, 0.05])
          FROM demo.comments_vs
          ORDER BY comment_vector ANN OF [0.1, 0.15, 0.3, 0.12, 0.05]
          LIMIT 1; 

      Just raising a ticket to link for a website PR.

      Attachments

        Issue Links

          Activity

            Raised https://github.com/apache/cassandra/pull/2902/files with the corrected CQL (and I fixed the keyspace names to use cycling instead of demo)

            jfleming Jackson Fleming added a comment - Raised https://github.com/apache/cassandra/pull/2902/files with the corrected CQL (and I fixed the keyspace names to use cycling instead of demo)

            Tagged Lorina for review.

            brandon.williams Brandon Williams added a comment - Tagged Lorina for review.

            I did something similar couple days ago here

            https://github.com/apache/cassandra/pull/2895

            smiklosovic Stefan Miklosovic added a comment - I did something similar couple days ago here https://github.com/apache/cassandra/pull/2895
            polandll Lorina Poland added a comment -

            Should I review and approve smiklosovic work or Jfleming work? And btw - I KNOW this needs to be merged, I've just been trying to avoid distractions and get UCS docs written. Repeated interruptions like duplicate PRs and tickets doesn't help me, folks!

            polandll Lorina Poland added a comment - Should I review and approve smiklosovic work or Jfleming work? And btw - I KNOW this needs to be merged, I've just been trying to avoid distractions and get UCS docs written. Repeated interruptions like duplicate PRs and tickets doesn't help me, folks!

            Thanks for the report and PR Jfleming.

            Despite these fixes being submitted (or in-flight) elsewhere, I'm going to merge this PR for the sake of recognition, it's very much appreciated.

            mck Michael Semb Wever added a comment - Thanks for the report and PR Jfleming . Despite these fixes being submitted (or in-flight) elsewhere, I'm going to merge this PR for the sake of recognition, it's very much appreciated.
            mck Michael Semb Wever added a comment - Committed with https://github.com/apache/cassandra/commit/f347c58efae4faf0e294bc1d4a086cb174597068

            additional fixes committed here https://github.com/apache/cassandra/commit/bee4b187e7b16f1760c68f5e47640d73ecd3dd47

            it mostly fixes a lot of broken links.

            smiklosovic Stefan Miklosovic added a comment - additional fixes committed here https://github.com/apache/cassandra/commit/bee4b187e7b16f1760c68f5e47640d73ecd3dd47 it mostly fixes a lot of broken links.
            smiklosovic Stefan Miklosovic added a comment - - edited

            BTW I noticed one interesting pattern which might be handy for whoever writes documentation: if you have an ascii doc file which starts with underscore (_somedoc.adoc), such file can be included but it can not be xref-ed

            There were (maybe still are) links which are xrefing a doc with underscore but it seems to me that this does not work nicely with antora / asciidoc. Workaround is to rename such files to not include underscore.

            smiklosovic Stefan Miklosovic added a comment - - edited BTW I noticed one interesting pattern which might be handy for whoever writes documentation: if you have an ascii doc file which starts with underscore (_somedoc.adoc), such file can be included but it can not be xref-ed There were (maybe still are) links which are xrefing a doc with underscore but it seems to me that this does not work nicely with antora / asciidoc. Workaround is to rename such files to not include underscore.
            polandll Lorina Poland added a comment -

            smiklosovic Yes, the filenames that start with underscores are a deliberate choice that I made. I want to include the files as "partial" files in a file that is made up of many steps. But I don't want those files to be included in the leftnav. I prefer not to have all these partial pages in the partials directory, separated from the other vector search files.

            It works exacty as expected when I look at the C* docs. The Working with Vector Search displays all the included files, and the leftnav doesn't have links. And I don't think I'm trying to xref anything in those files. If I'm missing something, please let me know!

            polandll Lorina Poland added a comment - smiklosovic Yes, the filenames that start with underscores are a deliberate choice that I made. I want to include the files as "partial" files in a file that is made up of many steps. But I don't want those files to be included in the leftnav. I prefer not to have all these partial pages in the partials directory, separated from the other vector search files. It works exacty as expected when I look at the C* docs. The Working with Vector Search displays all the included files, and the leftnav doesn't have links. And I don't think I'm trying to xref anything in those files. If I'm missing something, please let me know!

            People

              jfleming Jackson Fleming
              jfleming Jackson Fleming
              Jackson Fleming
              Lorina Poland, Michael Semb Wever
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m