Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-1208

ngram tokenizer failure with negative length

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • None

    Description

      Schemas

      drop dataverse test if exists;
      create dataverse test;
      use dataverse test;
      create type DBLPOpenType as open {
        id: int64,
        dblpid: string,
        authors: string,
        misc: string
      }
      create dataset DBLPOpen(DBLPOpenType) primary key id;
      insert into dataset DBLPOpen { "id": 93, "dblpid": "journals/iandc/IbarraJCR91", "authors": "Some Classes of Languages in NCĀ¹", "misc": "2006-04-25 86-106 Inf. Comput. January 1991 90 1 db/journals/iandc/iandc90.html#IbarraJCR91" }
      

      Query

      use dataverse test;
      set import-private-functions 'true'
      for $d in dataset DBLPOpen
      where similarity-jaccard(gram-tokens("",3,false),gram-tokens($d.title,3,false)) >= 0.5
      return {"rec": $d}
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            wangsaeu Taewoo Kim
            lwhay Wenhai Li
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment