Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7603

Support Graph Token Streams in QueryBuilder

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 6.4
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available
    • Flags:
      Patch

      Description

      With LUCENE-6664 we can use multi-term synonyms query time. A "graph token stream" will be created which which is nothing more than using the position length attribute on stacked tokens to indicate how many positions a token should span. Currently the position length attribute on tokens is ignored during query parsing. This issue will add support for handling these graph token streams inside the QueryBuilder utility class used by query parsers.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user mattweber opened a pull request:

          https://github.com/apache/lucene-solr/pull/129

          LUCENE-7603: Support Graph Token Streams in QueryBuilder

          Adds support for handling graph token streams inside the
          QueryBuilder util class used by query parsers.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/mattweber/lucene-solr LUCENE-7603

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/129.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #129


          commit 568cb43d6af1aeef96cc7b6cabb7237de9058f36
          Author: Matt Weber <matt@mattweber.org>
          Date: 2016-12-26T15:50:58Z

          Support Graph Token Streams in QueryBuilder

          Adds support for handling graph token streams inside the
          QueryBuilder util class used by query parsers.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user mattweber opened a pull request: https://github.com/apache/lucene-solr/pull/129 LUCENE-7603 : Support Graph Token Streams in QueryBuilder Adds support for handling graph token streams inside the QueryBuilder util class used by query parsers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mattweber/lucene-solr LUCENE-7603 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #129 commit 568cb43d6af1aeef96cc7b6cabb7237de9058f36 Author: Matt Weber <matt@mattweber.org> Date: 2016-12-26T15:50:58Z Support Graph Token Streams in QueryBuilder Adds support for handling graph token streams inside the QueryBuilder util class used by query parsers.
          Hide
          mikemccand Michael McCandless added a comment -

          Whoa, thanks Matt Weber, I'll have a look, but likely not until I'm back from vacation next year!

          Show
          mikemccand Michael McCandless added a comment - Whoa, thanks Matt Weber , I'll have a look, but likely not until I'm back from vacation next year!
          Hide
          mikemccand Michael McCandless added a comment -

          This change looks great; I think it's ready! The new TestGraphTokenStreamFiniteStrings is just missing the copyright header; I'll fix that before pushing.

          The gist of the change is when query parsing detects that the analyzer produced a graph (any token with PositionLengthAttribute > 1), e.g. because SynonymGraphFilter matched or inserted a multi-token synonym, then it creates a GraphQuery which just a wrapper around sub-queries that traverse each path of the graph.

          At search time, this query is currently rewritten to BooleanQuery with one clause for each path, but that is maybe something we can improve in the future, e.g. if it's a phrase query we could use TermAutomatonQuery ... but we should tackle that separately.

          At long last, this (along with using SynonymGraphFilter at search time) finally fixes the long-standing bugs around multi-token synonyms, e.g. LUCENE-4499, LUCENE-1622, https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter ...

          This will also be useful for other tokenizers/token filters as well, e.g. I'm working on having WordDelimiterFilter set position length correctly and Kuromoji (JapaneseTokenizer) already produces graph tokens.

          Show
          mikemccand Michael McCandless added a comment - This change looks great; I think it's ready! The new TestGraphTokenStreamFiniteStrings is just missing the copyright header; I'll fix that before pushing. The gist of the change is when query parsing detects that the analyzer produced a graph (any token with PositionLengthAttribute > 1), e.g. because SynonymGraphFilter matched or inserted a multi-token synonym, then it creates a GraphQuery which just a wrapper around sub-queries that traverse each path of the graph. At search time, this query is currently rewritten to BooleanQuery with one clause for each path, but that is maybe something we can improve in the future, e.g. if it's a phrase query we could use TermAutomatonQuery ... but we should tackle that separately. At long last, this (along with using SynonymGraphFilter at search time) finally fixes the long-standing bugs around multi-token synonyms, e.g. LUCENE-4499 , LUCENE-1622 , https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter ... This will also be useful for other tokenizers/token filters as well, e.g. I'm working on having WordDelimiterFilter set position length correctly and Kuromoji ( JapaneseTokenizer ) already produces graph tokens.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattweber commented on the issue:

          https://github.com/apache/lucene-solr/pull/129

          Rebased against master, added missing ASF header.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattweber commented on the issue: https://github.com/apache/lucene-solr/pull/129 Rebased against master, added missing ASF header.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user mattweber opened a pull request:

          https://github.com/apache/lucene-solr/pull/130

          LUCENE-7603: branch_6x Support Graph Token Streams in QueryBuilder

          Adds support for handling graph token streams inside the
          QueryBuilder util class used by query parsers. This is a backport to `branch_6x`.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/mattweber/lucene-solr LUCENE-7603_6x

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/130.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #130


          commit 7ab1638a3da002113e939b316d04ef9e58e38a0b
          Author: Matt Weber <matt@mattweber.org>
          Date: 2016-12-26T15:50:58Z

          Support Graph Token Streams in QueryBuilder

          Adds support for handling graph token streams inside the
          QueryBuilder util class used by query parsers.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user mattweber opened a pull request: https://github.com/apache/lucene-solr/pull/130 LUCENE-7603 : branch_6x Support Graph Token Streams in QueryBuilder Adds support for handling graph token streams inside the QueryBuilder util class used by query parsers. This is a backport to `branch_6x`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mattweber/lucene-solr LUCENE-7603 _6x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/130.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #130 commit 7ab1638a3da002113e939b316d04ef9e58e38a0b Author: Matt Weber <matt@mattweber.org> Date: 2016-12-26T15:50:58Z Support Graph Token Streams in QueryBuilder Adds support for handling graph token streams inside the QueryBuilder util class used by query parsers.
          Hide
          mattweber Matt Weber added a comment -

          Thanks for reviewing Michael McCandless! I have added the missing ASF header and moved the test into the proper package. I have also backported this to 6x and opened a new PR for that. The only difference in the 6x backport is disabling coord on the rewritten boolean query. Some of the tests are slightly different as well due to the fact that splitOnWhitespace defaults to true in 6x.

          Please let me know if you need me to change anything!

          Show
          mattweber Matt Weber added a comment - Thanks for reviewing Michael McCandless ! I have added the missing ASF header and moved the test into the proper package. I have also backported this to 6x and opened a new PR for that. The only difference in the 6x backport is disabling coord on the rewritten boolean query. Some of the tests are slightly different as well due to the fact that splitOnWhitespace defaults to true in 6x. Please let me know if you need me to change anything!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94148633

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -0,0 +1,294 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.lucene.util.graph;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.HashMap;
          +import java.util.List;
          +import java.util.Map;
          +
          +import org.apache.lucene.analysis.TokenStream;
          +import org.apache.lucene.analysis.tokenattributes.BytesTermAttribute;
          +import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
          +import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
          +import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute;
          +import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute;
          +import org.apache.lucene.util.BytesRef;
          +import org.apache.lucene.util.IntsRef;
          +import org.apache.lucene.util.automaton.Automaton;
          +import org.apache.lucene.util.automaton.FiniteStringsIterator;
          +import org.apache.lucene.util.automaton.Operations;
          +import org.apache.lucene.util.automaton.Transition;
          +
          +import static org.apache.lucene.util.automaton.Operations.DEFAULT_MAX_DETERMINIZED_STATES;
          +
          +/**
          + * Creates a list of

          {@link TokenStream}

          where each stream is the tokens that make up a finite string in graph token stream. To do this,
          + * the graph token stream is converted to an

          {@link Automaton}

          and from there we use a

          {@link FiniteStringsIterator}

          to collect the various
          + * token streams for each finite string.
          + */
          +public class GraphTokenStreamFiniteStrings {
          + /* TODO:
          + Most of this is a combination of code from TermAutomatonQuery and TokenStreamToTermAutomatonQuery. Would be
          + good to make this so it could be shared. */
          + private final Automaton.Builder builder;
          + Automaton det;
          + private final Map<BytesRef, Integer> termToID = new HashMap<>();
          + private final Map<Integer, BytesRef> idToTerm = new HashMap<>();
          + private int anyTermID = -1;
          +
          + public GraphTokenStreamFiniteStrings()

          { + this.builder = new Automaton.Builder(); + }

          +
          + private static class BytesRefArrayTokenStream extends TokenStream {
          + private final BytesTermAttribute termAtt = addAttribute(BytesTermAttribute.class);
          + private final BytesRef[] terms;
          + private int offset;
          +
          + BytesRefArrayTokenStream(BytesRef[] terms)

          { + this.terms = terms; + offset = 0; + }

          +
          + @Override
          + public boolean incrementToken() throws IOException {
          + if (offset < terms.length)

          { + clearAttributes(); + termAtt.setBytesRef(terms[offset]); + offset = offset + 1; + return true; + }

          +
          + return false;
          + }
          + }
          +
          + /**
          + * Gets the list of finite string token streams from the given input graph token stream.
          + */
          + public List<TokenStream> getTokenStreams(final TokenStream in) throws IOException {
          + // build automation
          + build(in);
          +
          + List<TokenStream> tokenStreams = new ArrayList<>();
          + final FiniteStringsIterator finiteStrings = new FiniteStringsIterator(det);
          + for (IntsRef string; (string = finiteStrings.next()) != null; ) {
          + final BytesRef[] tokens = new BytesRef[string.length];
          + for (int idx = string.offset, len = string.offset + string.length; idx < len; idx++)

          { + tokens[idx - string.offset] = idToTerm.get(string.ints[idx]); + }

          +
          + tokenStreams.add(new BytesRefArrayTokenStream(tokens));
          + }
          +
          + return tokenStreams;
          + }
          +
          + private void build(final TokenStream in) throws IOException {
          + if (det != null)

          { + throw new IllegalStateException("Automation already built"); + }

          +
          + final TermToBytesRefAttribute termBytesAtt = in.addAttribute(TermToBytesRefAttribute.class);
          + final PositionIncrementAttribute posIncAtt = in.addAttribute(PositionIncrementAttribute.class);
          + final PositionLengthAttribute posLengthAtt = in.addAttribute(PositionLengthAttribute.class);
          + final OffsetAttribute offsetAtt = in.addAttribute(OffsetAttribute.class);
          +
          + in.reset();
          +
          + int pos = -1;
          + int lastPos = 0;
          + int maxOffset = 0;
          + int maxPos = -1;
          + int state = -1;
          + while (in.incrementToken()) {
          + int posInc = posIncAtt.getPositionIncrement();
          + assert pos > -1 || posInc > 0;
          +
          + if (posInc > 1) {
          — End diff –

          This seems like a notable limitation that should be documented in javadocs somewhere. Can't we support holes without demanding the stream use '*' ? And might there be a test for this?

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94148633 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -0,0 +1,294 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.graph; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.tokenattributes.BytesTermAttribute; +import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; +import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; +import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; +import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IntsRef; +import org.apache.lucene.util.automaton.Automaton; +import org.apache.lucene.util.automaton.FiniteStringsIterator; +import org.apache.lucene.util.automaton.Operations; +import org.apache.lucene.util.automaton.Transition; + +import static org.apache.lucene.util.automaton.Operations.DEFAULT_MAX_DETERMINIZED_STATES; + +/** + * Creates a list of {@link TokenStream} where each stream is the tokens that make up a finite string in graph token stream. To do this, + * the graph token stream is converted to an {@link Automaton} and from there we use a {@link FiniteStringsIterator} to collect the various + * token streams for each finite string. + */ +public class GraphTokenStreamFiniteStrings { + /* TODO: + Most of this is a combination of code from TermAutomatonQuery and TokenStreamToTermAutomatonQuery. Would be + good to make this so it could be shared. */ + private final Automaton.Builder builder; + Automaton det; + private final Map<BytesRef, Integer> termToID = new HashMap<>(); + private final Map<Integer, BytesRef> idToTerm = new HashMap<>(); + private int anyTermID = -1; + + public GraphTokenStreamFiniteStrings() { + this.builder = new Automaton.Builder(); + } + + private static class BytesRefArrayTokenStream extends TokenStream { + private final BytesTermAttribute termAtt = addAttribute(BytesTermAttribute.class); + private final BytesRef[] terms; + private int offset; + + BytesRefArrayTokenStream(BytesRef[] terms) { + this.terms = terms; + offset = 0; + } + + @Override + public boolean incrementToken() throws IOException { + if (offset < terms.length) { + clearAttributes(); + termAtt.setBytesRef(terms[offset]); + offset = offset + 1; + return true; + } + + return false; + } + } + + /** + * Gets the list of finite string token streams from the given input graph token stream. + */ + public List<TokenStream> getTokenStreams(final TokenStream in) throws IOException { + // build automation + build(in); + + List<TokenStream> tokenStreams = new ArrayList<>(); + final FiniteStringsIterator finiteStrings = new FiniteStringsIterator(det); + for (IntsRef string; (string = finiteStrings.next()) != null; ) { + final BytesRef[] tokens = new BytesRef [string.length] ; + for (int idx = string.offset, len = string.offset + string.length; idx < len; idx++) { + tokens[idx - string.offset] = idToTerm.get(string.ints[idx]); + } + + tokenStreams.add(new BytesRefArrayTokenStream(tokens)); + } + + return tokenStreams; + } + + private void build(final TokenStream in) throws IOException { + if (det != null) { + throw new IllegalStateException("Automation already built"); + } + + final TermToBytesRefAttribute termBytesAtt = in.addAttribute(TermToBytesRefAttribute.class); + final PositionIncrementAttribute posIncAtt = in.addAttribute(PositionIncrementAttribute.class); + final PositionLengthAttribute posLengthAtt = in.addAttribute(PositionLengthAttribute.class); + final OffsetAttribute offsetAtt = in.addAttribute(OffsetAttribute.class); + + in.reset(); + + int pos = -1; + int lastPos = 0; + int maxOffset = 0; + int maxPos = -1; + int state = -1; + while (in.incrementToken()) { + int posInc = posIncAtt.getPositionIncrement(); + assert pos > -1 || posInc > 0; + + if (posInc > 1) { — End diff – This seems like a notable limitation that should be documented in javadocs somewhere. Can't we support holes without demanding the stream use '*' ? And might there be a test for this?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattweber commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94171262

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -0,0 +1,294 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.lucene.util.graph;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.HashMap;
          +import java.util.List;
          +import java.util.Map;
          +
          +import org.apache.lucene.analysis.TokenStream;
          +import org.apache.lucene.analysis.tokenattributes.BytesTermAttribute;
          +import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
          +import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
          +import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute;
          +import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute;
          +import org.apache.lucene.util.BytesRef;
          +import org.apache.lucene.util.IntsRef;
          +import org.apache.lucene.util.automaton.Automaton;
          +import org.apache.lucene.util.automaton.FiniteStringsIterator;
          +import org.apache.lucene.util.automaton.Operations;
          +import org.apache.lucene.util.automaton.Transition;
          +
          +import static org.apache.lucene.util.automaton.Operations.DEFAULT_MAX_DETERMINIZED_STATES;
          +
          +/**
          + * Creates a list of

          {@link TokenStream}

          where each stream is the tokens that make up a finite string in graph token stream. To do this,
          + * the graph token stream is converted to an

          {@link Automaton}

          and from there we use a

          {@link FiniteStringsIterator}

          to collect the various
          + * token streams for each finite string.
          + */
          +public class GraphTokenStreamFiniteStrings {
          + /* TODO:
          + Most of this is a combination of code from TermAutomatonQuery and TokenStreamToTermAutomatonQuery. Would be
          + good to make this so it could be shared. */
          + private final Automaton.Builder builder;
          + Automaton det;
          + private final Map<BytesRef, Integer> termToID = new HashMap<>();
          + private final Map<Integer, BytesRef> idToTerm = new HashMap<>();
          + private int anyTermID = -1;
          +
          + public GraphTokenStreamFiniteStrings()

          { + this.builder = new Automaton.Builder(); + }

          +
          + private static class BytesRefArrayTokenStream extends TokenStream {
          + private final BytesTermAttribute termAtt = addAttribute(BytesTermAttribute.class);
          + private final BytesRef[] terms;
          + private int offset;
          +
          + BytesRefArrayTokenStream(BytesRef[] terms)

          { + this.terms = terms; + offset = 0; + }

          +
          + @Override
          + public boolean incrementToken() throws IOException {
          + if (offset < terms.length)

          { + clearAttributes(); + termAtt.setBytesRef(terms[offset]); + offset = offset + 1; + return true; + }

          +
          + return false;
          + }
          + }
          +
          + /**
          + * Gets the list of finite string token streams from the given input graph token stream.
          + */
          + public List<TokenStream> getTokenStreams(final TokenStream in) throws IOException {
          + // build automation
          + build(in);
          +
          + List<TokenStream> tokenStreams = new ArrayList<>();
          + final FiniteStringsIterator finiteStrings = new FiniteStringsIterator(det);
          + for (IntsRef string; (string = finiteStrings.next()) != null; ) {
          + final BytesRef[] tokens = new BytesRef[string.length];
          + for (int idx = string.offset, len = string.offset + string.length; idx < len; idx++)

          { + tokens[idx - string.offset] = idToTerm.get(string.ints[idx]); + }

          +
          + tokenStreams.add(new BytesRefArrayTokenStream(tokens));
          + }
          +
          + return tokenStreams;
          + }
          +
          + private void build(final TokenStream in) throws IOException {
          + if (det != null)

          { + throw new IllegalStateException("Automation already built"); + }

          +
          + final TermToBytesRefAttribute termBytesAtt = in.addAttribute(TermToBytesRefAttribute.class);
          + final PositionIncrementAttribute posIncAtt = in.addAttribute(PositionIncrementAttribute.class);
          + final PositionLengthAttribute posLengthAtt = in.addAttribute(PositionLengthAttribute.class);
          + final OffsetAttribute offsetAtt = in.addAttribute(OffsetAttribute.class);
          +
          + in.reset();
          +
          + int pos = -1;
          + int lastPos = 0;
          + int maxOffset = 0;
          + int maxPos = -1;
          + int state = -1;
          + while (in.incrementToken()) {
          + int posInc = posIncAtt.getPositionIncrement();
          + assert pos > -1 || posInc > 0;
          +
          + if (posInc > 1) {
          — End diff –

          @dsmiley That was actually pulled out of the existing [TokenStreamToTermAutomatonQuery.java](https://github.com/apache/lucene-solr/blob/master/lucene/sandbox/src/java/org/apache/lucene/search/TokenStreamToTermAutomatonQuery.java#L77). Let me look into it more.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattweber commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94171262 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -0,0 +1,294 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.graph; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.tokenattributes.BytesTermAttribute; +import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; +import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; +import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; +import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IntsRef; +import org.apache.lucene.util.automaton.Automaton; +import org.apache.lucene.util.automaton.FiniteStringsIterator; +import org.apache.lucene.util.automaton.Operations; +import org.apache.lucene.util.automaton.Transition; + +import static org.apache.lucene.util.automaton.Operations.DEFAULT_MAX_DETERMINIZED_STATES; + +/** + * Creates a list of {@link TokenStream} where each stream is the tokens that make up a finite string in graph token stream. To do this, + * the graph token stream is converted to an {@link Automaton} and from there we use a {@link FiniteStringsIterator} to collect the various + * token streams for each finite string. + */ +public class GraphTokenStreamFiniteStrings { + /* TODO: + Most of this is a combination of code from TermAutomatonQuery and TokenStreamToTermAutomatonQuery. Would be + good to make this so it could be shared. */ + private final Automaton.Builder builder; + Automaton det; + private final Map<BytesRef, Integer> termToID = new HashMap<>(); + private final Map<Integer, BytesRef> idToTerm = new HashMap<>(); + private int anyTermID = -1; + + public GraphTokenStreamFiniteStrings() { + this.builder = new Automaton.Builder(); + } + + private static class BytesRefArrayTokenStream extends TokenStream { + private final BytesTermAttribute termAtt = addAttribute(BytesTermAttribute.class); + private final BytesRef[] terms; + private int offset; + + BytesRefArrayTokenStream(BytesRef[] terms) { + this.terms = terms; + offset = 0; + } + + @Override + public boolean incrementToken() throws IOException { + if (offset < terms.length) { + clearAttributes(); + termAtt.setBytesRef(terms[offset]); + offset = offset + 1; + return true; + } + + return false; + } + } + + /** + * Gets the list of finite string token streams from the given input graph token stream. + */ + public List<TokenStream> getTokenStreams(final TokenStream in) throws IOException { + // build automation + build(in); + + List<TokenStream> tokenStreams = new ArrayList<>(); + final FiniteStringsIterator finiteStrings = new FiniteStringsIterator(det); + for (IntsRef string; (string = finiteStrings.next()) != null; ) { + final BytesRef[] tokens = new BytesRef [string.length] ; + for (int idx = string.offset, len = string.offset + string.length; idx < len; idx++) { + tokens[idx - string.offset] = idToTerm.get(string.ints[idx]); + } + + tokenStreams.add(new BytesRefArrayTokenStream(tokens)); + } + + return tokenStreams; + } + + private void build(final TokenStream in) throws IOException { + if (det != null) { + throw new IllegalStateException("Automation already built"); + } + + final TermToBytesRefAttribute termBytesAtt = in.addAttribute(TermToBytesRefAttribute.class); + final PositionIncrementAttribute posIncAtt = in.addAttribute(PositionIncrementAttribute.class); + final PositionLengthAttribute posLengthAtt = in.addAttribute(PositionLengthAttribute.class); + final OffsetAttribute offsetAtt = in.addAttribute(OffsetAttribute.class); + + in.reset(); + + int pos = -1; + int lastPos = 0; + int maxOffset = 0; + int maxPos = -1; + int state = -1; + while (in.incrementToken()) { + int posInc = posIncAtt.getPositionIncrement(); + assert pos > -1 || posInc > 0; + + if (posInc > 1) { — End diff – @dsmiley That was actually pulled out of the existing [TokenStreamToTermAutomatonQuery.java] ( https://github.com/apache/lucene-solr/blob/master/lucene/sandbox/src/java/org/apache/lucene/search/TokenStreamToTermAutomatonQuery.java#L77 ). Let me look into it more.
          Hide
          mattweber Matt Weber added a comment -

          David Smiley . Thank you for the review! I was able to come up with a way to preserve position increment gaps. Can you please take another look?

          Michael McCandless Can you please have another look as well?

          Show
          mattweber Matt Weber added a comment - David Smiley . Thank you for the review! I was able to come up with a way to preserve position increment gaps. Can you please take another look? Michael McCandless Can you please have another look as well?
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Matt Weber; I will have a look, probably once back from vacation next year.

          Show
          mikemccand Michael McCandless added a comment - Thanks Matt Weber ; I will have a look, probably once back from vacation next year.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikemccand commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94216469

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -210,82 +215,20 @@ private void finish() {
          */
          private void finish(int maxDeterminizedStates) {
          Automaton automaton = builder.finish();
          -

          • // System.out.println("before det:\n" + automaton.toDot());
            -
          • Transition t = new Transition();
            -
          • // TODO: should we add "eps back to initial node" for all states,
          • // and det that? then we don't need to revisit initial node at
          • // every position? but automaton could blow up? And, this makes it
          • // harder to skip useless positions at search time?
            -
          • if (anyTermID != -1) {
            -
          • // Make sure there are no leading or trailing ANY:
          • int count = automaton.initTransition(0, t);
          • for (int i = 0; i < count; i++) {
          • automaton.getNextTransition(t);
          • if (anyTermID >= t.min && anyTermID <= t.max) { - throw new IllegalStateException("automaton cannot lead with an ANY transition"); - }
          • }
            -
          • int numStates = automaton.getNumStates();
          • for (int i = 0; i < numStates; i++) {
          • count = automaton.initTransition(i, t);
          • for (int j = 0; j < count; j++) {
          • automaton.getNextTransition(t);
          • if (automaton.isAccept(t.dest) && anyTermID >= t.min && anyTermID <= t.max) { - throw new IllegalStateException("automaton cannot end with an ANY transition"); - }
          • }
          • }
            -
          • int termCount = termToID.size();
            -
          • // We have to carefully translate these transitions so automaton
          • // realizes they also match all other terms:
          • Automaton newAutomaton = new Automaton();
          • for (int i = 0; i < numStates; i++) { - newAutomaton.createState(); - newAutomaton.setAccept(i, automaton.isAccept(i)); - }

            -

          • for (int i = 0; i < numStates; i++) {
          • count = automaton.initTransition(i, t);
          • for (int j = 0; j < count; j++) {
          • automaton.getNextTransition(t);
          • int min, max;
          • if (t.min <= anyTermID && anyTermID <= t.max) { - // Match any term - min = 0; - max = termCount - 1; - }

            else

            { - min = t.min; - max = t.max; - }
          • newAutomaton.addTransition(t.source, t.dest, min, max);
          • }
          • }
          • newAutomaton.finishState();
          • automaton = newAutomaton;
          • }
            -
            det = Operations.removeDeadStates(Operations.determinize(automaton, maxDeterminizedStates));
            }
          • private int getTermID(BytesRef term) {
            + private int getTermID(int incr, BytesRef term) {
            Integer id = termToID.get(term);
          • if (id == null) {
            + if (incr > 1 || id == null) {
              • End diff –

          Hmm doesn't this mean that if the same term shows up, but with different `incr`, that it will get different `id` assigned? But I think that is actually fine, since nowhere here do we depend on / expect that the same term must have the same id.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikemccand commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94216469 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -210,82 +215,20 @@ private void finish() { */ private void finish(int maxDeterminizedStates) { Automaton automaton = builder.finish(); - // System.out.println("before det:\n" + automaton.toDot()); - Transition t = new Transition(); - // TODO: should we add "eps back to initial node" for all states, // and det that? then we don't need to revisit initial node at // every position? but automaton could blow up? And, this makes it // harder to skip useless positions at search time? - if (anyTermID != -1) { - // Make sure there are no leading or trailing ANY: int count = automaton.initTransition(0, t); for (int i = 0; i < count; i++) { automaton.getNextTransition(t); if (anyTermID >= t.min && anyTermID <= t.max) { - throw new IllegalStateException("automaton cannot lead with an ANY transition"); - } } - int numStates = automaton.getNumStates(); for (int i = 0; i < numStates; i++) { count = automaton.initTransition(i, t); for (int j = 0; j < count; j++) { automaton.getNextTransition(t); if (automaton.isAccept(t.dest) && anyTermID >= t.min && anyTermID <= t.max) { - throw new IllegalStateException("automaton cannot end with an ANY transition"); - } } } - int termCount = termToID.size(); - // We have to carefully translate these transitions so automaton // realizes they also match all other terms: Automaton newAutomaton = new Automaton(); for (int i = 0; i < numStates; i++) { - newAutomaton.createState(); - newAutomaton.setAccept(i, automaton.isAccept(i)); - } - for (int i = 0; i < numStates; i++) { count = automaton.initTransition(i, t); for (int j = 0; j < count; j++) { automaton.getNextTransition(t); int min, max; if (t.min <= anyTermID && anyTermID <= t.max) { - // Match any term - min = 0; - max = termCount - 1; - } else { - min = t.min; - max = t.max; - } newAutomaton.addTransition(t.source, t.dest, min, max); } } newAutomaton.finishState(); automaton = newAutomaton; } - det = Operations.removeDeadStates(Operations.determinize(automaton, maxDeterminizedStates)); } private int getTermID(BytesRef term) { + private int getTermID(int incr, BytesRef term) { Integer id = termToID.get(term); if (id == null) { + if (incr > 1 || id == null) { End diff – Hmm doesn't this mean that if the same term shows up, but with different `incr`, that it will get different `id` assigned? But I think that is actually fine, since nowhere here do we depend on / expect that the same term must have the same id.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikemccand commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94215751

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -114,21 +127,20 @@ private void build(final TokenStream in) throws IOException {
          in.reset();

          int pos = -1;

          • int lastPos = 0;
            + int lastIncr = 1;
            int maxOffset = 0;
            int maxPos = -1;
            int state = -1;
            while (in.incrementToken()) {
            int posInc = posIncAtt.getPositionIncrement();
          • assert pos > -1 || posInc > 0;
          • if (posInc > 1) { - throw new IllegalArgumentException("cannot handle holes; to accept any term, use '*' term"); - }

            + // always use inc 1 while building, but save original increment
            + int fakePosInc = posInc > 1 ? 1 : posInc;

              • End diff –

          Maybe just `Math.min(1, posInc)` instead?

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikemccand commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94215751 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -114,21 +127,20 @@ private void build(final TokenStream in) throws IOException { in.reset(); int pos = -1; int lastPos = 0; + int lastIncr = 1; int maxOffset = 0; int maxPos = -1; int state = -1; while (in.incrementToken()) { int posInc = posIncAtt.getPositionIncrement(); assert pos > -1 || posInc > 0; if (posInc > 1) { - throw new IllegalArgumentException("cannot handle holes; to accept any term, use '*' term"); - } + // always use inc 1 while building, but save original increment + int fakePosInc = posInc > 1 ? 1 : posInc; End diff – Maybe just `Math.min(1, posInc)` instead?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikemccand commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94216511

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -174,25 +191,13 @@ private void setAccept(int state, boolean accept) {
          /**

          • Adds a transition to the automaton.
            */
          • private void addTransition(int source, int dest, String term) { - addTransition(source, dest, new BytesRef(term)); - }

            -

          • /**
          • * Adds a transition to the automaton.
          • */
          • private void addTransition(int source, int dest, BytesRef term) {
            + private int addTransition(int source, int dest, int incr, BytesRef term) {
            if (term == null) {
              • End diff –

          This can become an assert?

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikemccand commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94216511 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -174,25 +191,13 @@ private void setAccept(int state, boolean accept) { /** Adds a transition to the automaton. */ private void addTransition(int source, int dest, String term) { - addTransition(source, dest, new BytesRef(term)); - } - /** * Adds a transition to the automaton. */ private void addTransition(int source, int dest, BytesRef term) { + private int addTransition(int source, int dest, int incr, BytesRef term) { if (term == null) { End diff – This can become an assert?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikemccand commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94217160

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -0,0 +1,237 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.lucene.util.graph;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.HashMap;
          +import java.util.List;
          +import java.util.Map;
          +
          +import org.apache.lucene.analysis.TokenStream;
          +import org.apache.lucene.analysis.tokenattributes.BytesTermAttribute;
          +import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
          +import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
          +import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute;
          +import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute;
          +import org.apache.lucene.util.BytesRef;
          +import org.apache.lucene.util.IntsRef;
          +import org.apache.lucene.util.automaton.Automaton;
          +import org.apache.lucene.util.automaton.FiniteStringsIterator;
          +import org.apache.lucene.util.automaton.Operations;
          +
          +import static org.apache.lucene.util.automaton.Operations.DEFAULT_MAX_DETERMINIZED_STATES;
          +
          +/**
          + * Creates a list of

          {@link TokenStream}

          where each stream is the tokens that make up a finite string in graph token stream. To do this,
          + * the graph token stream is converted to an

          {@link Automaton}

          and from there we use a

          {@link FiniteStringsIterator}

          to collect the various
          + * token streams for each finite string.
          + */
          +public class GraphTokenStreamFiniteStrings {
          + /* TODO:
          + Most of this is a combination of code from TermAutomatonQuery and TokenStreamToTermAutomatonQuery. Would be
          + good to make this so it could be shared. */
          + private final Automaton.Builder builder;
          + Automaton det;
          + private final Map<BytesRef, Integer> termToID = new HashMap<>();
          + private final Map<Integer, BytesRef> idToTerm = new HashMap<>();
          + private final Map<Integer, Integer> idToInc = new HashMap<>();
          +
          + public GraphTokenStreamFiniteStrings()

          { + this.builder = new Automaton.Builder(); + }

          +
          + private static class BytesRefArrayTokenStream extends TokenStream {
          + private final BytesTermAttribute termAtt = addAttribute(BytesTermAttribute.class);
          + private final PositionIncrementAttribute posIncAtt = addAttribute(PositionIncrementAttribute.class);
          +
          + private final BytesRef[] terms;
          + private final int[] increments;
          + private int offset;
          +
          + BytesRefArrayTokenStream(BytesRef[] terms, int[] increments)

          { + this.terms = terms; + this.increments = increments; + assert terms.length == increments.length; + offset = 0; + }

          +
          + @Override
          + public boolean incrementToken() throws IOException {
          + if (offset < terms.length)

          { + clearAttributes(); + termAtt.setBytesRef(terms[offset]); + posIncAtt.setPositionIncrement(increments[offset]); + offset = offset + 1; + return true; + }

          +
          + return false;
          + }
          + }
          +
          + /**
          + * Gets the list of finite string token streams from the given input graph token stream.
          + */
          + public List<TokenStream> getTokenStreams(final TokenStream in) throws IOException {
          — End diff –

          Could we make this method private, make this class's constructor private, and add a `static` method here, the sole public method on this class, that receives the incoming `TokenStream` and returns the resulting `TokenStream[]`? Otherwise the API is sort of awkard, since e.g. this method seems like a getter yet it's doing lots of side-effects under the hood ...

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikemccand commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94217160 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.graph; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.tokenattributes.BytesTermAttribute; +import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; +import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; +import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; +import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IntsRef; +import org.apache.lucene.util.automaton.Automaton; +import org.apache.lucene.util.automaton.FiniteStringsIterator; +import org.apache.lucene.util.automaton.Operations; + +import static org.apache.lucene.util.automaton.Operations.DEFAULT_MAX_DETERMINIZED_STATES; + +/** + * Creates a list of {@link TokenStream} where each stream is the tokens that make up a finite string in graph token stream. To do this, + * the graph token stream is converted to an {@link Automaton} and from there we use a {@link FiniteStringsIterator} to collect the various + * token streams for each finite string. + */ +public class GraphTokenStreamFiniteStrings { + /* TODO: + Most of this is a combination of code from TermAutomatonQuery and TokenStreamToTermAutomatonQuery. Would be + good to make this so it could be shared. */ + private final Automaton.Builder builder; + Automaton det; + private final Map<BytesRef, Integer> termToID = new HashMap<>(); + private final Map<Integer, BytesRef> idToTerm = new HashMap<>(); + private final Map<Integer, Integer> idToInc = new HashMap<>(); + + public GraphTokenStreamFiniteStrings() { + this.builder = new Automaton.Builder(); + } + + private static class BytesRefArrayTokenStream extends TokenStream { + private final BytesTermAttribute termAtt = addAttribute(BytesTermAttribute.class); + private final PositionIncrementAttribute posIncAtt = addAttribute(PositionIncrementAttribute.class); + + private final BytesRef[] terms; + private final int[] increments; + private int offset; + + BytesRefArrayTokenStream(BytesRef[] terms, int[] increments) { + this.terms = terms; + this.increments = increments; + assert terms.length == increments.length; + offset = 0; + } + + @Override + public boolean incrementToken() throws IOException { + if (offset < terms.length) { + clearAttributes(); + termAtt.setBytesRef(terms[offset]); + posIncAtt.setPositionIncrement(increments[offset]); + offset = offset + 1; + return true; + } + + return false; + } + } + + /** + * Gets the list of finite string token streams from the given input graph token stream. + */ + public List<TokenStream> getTokenStreams(final TokenStream in) throws IOException { — End diff – Could we make this method private, make this class's constructor private, and add a `static` method here, the sole public method on this class, that receives the incoming `TokenStream` and returns the resulting `TokenStream[]`? Otherwise the API is sort of awkard, since e.g. this method seems like a getter yet it's doing lots of side-effects under the hood ...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikemccand commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94217475

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -0,0 +1,237 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.lucene.util.graph;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.HashMap;
          +import java.util.List;
          +import java.util.Map;
          +
          +import org.apache.lucene.analysis.TokenStream;
          +import org.apache.lucene.analysis.tokenattributes.BytesTermAttribute;
          +import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
          +import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
          +import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute;
          +import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute;
          +import org.apache.lucene.util.BytesRef;
          +import org.apache.lucene.util.IntsRef;
          +import org.apache.lucene.util.automaton.Automaton;
          +import org.apache.lucene.util.automaton.FiniteStringsIterator;
          +import org.apache.lucene.util.automaton.Operations;
          +
          +import static org.apache.lucene.util.automaton.Operations.DEFAULT_MAX_DETERMINIZED_STATES;
          +
          +/**
          + * Creates a list of

          {@link TokenStream}

          where each stream is the tokens that make up a finite string in graph token stream. To do this,
          + * the graph token stream is converted to an

          {@link Automaton}

          and from there we use a

          {@link FiniteStringsIterator}

          to collect the various
          + * token streams for each finite string.
          + */
          +public class GraphTokenStreamFiniteStrings {
          + /* TODO:
          + Most of this is a combination of code from TermAutomatonQuery and TokenStreamToTermAutomatonQuery. Would be
          + good to make this so it could be shared. */
          + private final Automaton.Builder builder;
          + Automaton det;
          + private final Map<BytesRef, Integer> termToID = new HashMap<>();
          + private final Map<Integer, BytesRef> idToTerm = new HashMap<>();
          + private final Map<Integer, Integer> idToInc = new HashMap<>();
          +
          + public GraphTokenStreamFiniteStrings()

          { + this.builder = new Automaton.Builder(); + }

          +
          + private static class BytesRefArrayTokenStream extends TokenStream {
          + private final BytesTermAttribute termAtt = addAttribute(BytesTermAttribute.class);
          + private final PositionIncrementAttribute posIncAtt = addAttribute(PositionIncrementAttribute.class);
          +
          + private final BytesRef[] terms;
          + private final int[] increments;
          + private int offset;
          +
          + BytesRefArrayTokenStream(BytesRef[] terms, int[] increments)

          { + this.terms = terms; + this.increments = increments; + assert terms.length == increments.length; + offset = 0; + }

          +
          + @Override
          + public boolean incrementToken() throws IOException {
          + if (offset < terms.length)

          { + clearAttributes(); + termAtt.setBytesRef(terms[offset]); + posIncAtt.setPositionIncrement(increments[offset]); + offset = offset + 1; + return true; + }

          +
          + return false;
          + }
          + }
          +
          + /**
          + * Gets the list of finite string token streams from the given input graph token stream.
          + */
          + public List<TokenStream> getTokenStreams(final TokenStream in) throws IOException {
          + // build automation
          + build(in);
          +
          + List<TokenStream> tokenStreams = new ArrayList<>();
          + final FiniteStringsIterator finiteStrings = new FiniteStringsIterator(det);
          + for (IntsRef string; (string = finiteStrings.next()) != null; ) {
          + final BytesRef[] tokens = new BytesRef[string.length];
          + final int[] increments = new int[string.length];
          + for (int idx = string.offset, len = string.offset + string.length; idx < len; idx++) {
          + int id = string.ints[idx];
          + int offset = idx - string.offset;
          + tokens[offset] = idToTerm.get(id);
          + if (idToInc.containsKey(id))

          { + increments[offset] = idToInc.get(id); + }

          else

          { + increments[offset] = 1; + }

          + }
          +
          + tokenStreams.add(new BytesRefArrayTokenStream(tokens, increments));
          + }
          +
          + return tokenStreams;
          + }
          +
          + private void build(final TokenStream in) throws IOException {
          + if (det != null)

          { + throw new IllegalStateException("Automation already built"); + }

          +
          + final TermToBytesRefAttribute termBytesAtt = in.addAttribute(TermToBytesRefAttribute.class);
          + final PositionIncrementAttribute posIncAtt = in.addAttribute(PositionIncrementAttribute.class);
          + final PositionLengthAttribute posLengthAtt = in.addAttribute(PositionLengthAttribute.class);
          + final OffsetAttribute offsetAtt = in.addAttribute(OffsetAttribute.class);
          +
          + in.reset();
          +
          + int pos = -1;
          + int lastIncr = 1;
          + int maxOffset = 0;
          + int maxPos = -1;
          + int state = -1;
          + while (in.incrementToken()) {
          + int posInc = posIncAtt.getPositionIncrement();
          +
          + // always use inc 1 while building, but save original increment
          + int fakePosInc = posInc > 1 ? 1 : posInc;
          +
          + assert pos > -1 || fakePosInc > 0;
          — End diff –

          Can we upgrade this to a real `if`? I.e. we need a well-formed `TokenStream` input ... it cannot have `posInc=0` on its first token.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikemccand commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94217475 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.graph; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.tokenattributes.BytesTermAttribute; +import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; +import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; +import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; +import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IntsRef; +import org.apache.lucene.util.automaton.Automaton; +import org.apache.lucene.util.automaton.FiniteStringsIterator; +import org.apache.lucene.util.automaton.Operations; + +import static org.apache.lucene.util.automaton.Operations.DEFAULT_MAX_DETERMINIZED_STATES; + +/** + * Creates a list of {@link TokenStream} where each stream is the tokens that make up a finite string in graph token stream. To do this, + * the graph token stream is converted to an {@link Automaton} and from there we use a {@link FiniteStringsIterator} to collect the various + * token streams for each finite string. + */ +public class GraphTokenStreamFiniteStrings { + /* TODO: + Most of this is a combination of code from TermAutomatonQuery and TokenStreamToTermAutomatonQuery. Would be + good to make this so it could be shared. */ + private final Automaton.Builder builder; + Automaton det; + private final Map<BytesRef, Integer> termToID = new HashMap<>(); + private final Map<Integer, BytesRef> idToTerm = new HashMap<>(); + private final Map<Integer, Integer> idToInc = new HashMap<>(); + + public GraphTokenStreamFiniteStrings() { + this.builder = new Automaton.Builder(); + } + + private static class BytesRefArrayTokenStream extends TokenStream { + private final BytesTermAttribute termAtt = addAttribute(BytesTermAttribute.class); + private final PositionIncrementAttribute posIncAtt = addAttribute(PositionIncrementAttribute.class); + + private final BytesRef[] terms; + private final int[] increments; + private int offset; + + BytesRefArrayTokenStream(BytesRef[] terms, int[] increments) { + this.terms = terms; + this.increments = increments; + assert terms.length == increments.length; + offset = 0; + } + + @Override + public boolean incrementToken() throws IOException { + if (offset < terms.length) { + clearAttributes(); + termAtt.setBytesRef(terms[offset]); + posIncAtt.setPositionIncrement(increments[offset]); + offset = offset + 1; + return true; + } + + return false; + } + } + + /** + * Gets the list of finite string token streams from the given input graph token stream. + */ + public List<TokenStream> getTokenStreams(final TokenStream in) throws IOException { + // build automation + build(in); + + List<TokenStream> tokenStreams = new ArrayList<>(); + final FiniteStringsIterator finiteStrings = new FiniteStringsIterator(det); + for (IntsRef string; (string = finiteStrings.next()) != null; ) { + final BytesRef[] tokens = new BytesRef [string.length] ; + final int[] increments = new int [string.length] ; + for (int idx = string.offset, len = string.offset + string.length; idx < len; idx++) { + int id = string.ints [idx] ; + int offset = idx - string.offset; + tokens [offset] = idToTerm.get(id); + if (idToInc.containsKey(id)) { + increments[offset] = idToInc.get(id); + } else { + increments[offset] = 1; + } + } + + tokenStreams.add(new BytesRefArrayTokenStream(tokens, increments)); + } + + return tokenStreams; + } + + private void build(final TokenStream in) throws IOException { + if (det != null) { + throw new IllegalStateException("Automation already built"); + } + + final TermToBytesRefAttribute termBytesAtt = in.addAttribute(TermToBytesRefAttribute.class); + final PositionIncrementAttribute posIncAtt = in.addAttribute(PositionIncrementAttribute.class); + final PositionLengthAttribute posLengthAtt = in.addAttribute(PositionLengthAttribute.class); + final OffsetAttribute offsetAtt = in.addAttribute(OffsetAttribute.class); + + in.reset(); + + int pos = -1; + int lastIncr = 1; + int maxOffset = 0; + int maxPos = -1; + int state = -1; + while (in.incrementToken()) { + int posInc = posIncAtt.getPositionIncrement(); + + // always use inc 1 while building, but save original increment + int fakePosInc = posInc > 1 ? 1 : posInc; + + assert pos > -1 || fakePosInc > 0; — End diff – Can we upgrade this to a real `if`? I.e. we need a well-formed `TokenStream` input ... it cannot have `posInc=0` on its first token.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattweber commented on the issue:

          https://github.com/apache/lucene-solr/pull/129

          @mikemccand I addressed you comments. I also added some more tests and fixed a bug that would yield wrong increment when a term that had previously been seen was found again with an increment of 0. Tests were added. I have squashed these changes with the previous commit so it is clear to see the difference between the original PR which did not support position increments and the new one that does.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattweber commented on the issue: https://github.com/apache/lucene-solr/pull/129 @mikemccand I addressed you comments. I also added some more tests and fixed a bug that would yield wrong increment when a term that had previously been seen was found again with an increment of 0. Tests were added. I have squashed these changes with the previous commit so it is clear to see the difference between the original PR which did not support position increments and the new one that does.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94243375

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -210,85 +199,41 @@ private void finish() {
          */
          private void finish(int maxDeterminizedStates) {
          Automaton automaton = builder.finish();
          -
          — End diff –

          So all this code here removed wasn't needed after all? It's nice to see it all go away (less to maintain / less complexity)

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94243375 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -210,85 +199,41 @@ private void finish() { */ private void finish(int maxDeterminizedStates) { Automaton automaton = builder.finish(); - — End diff – So all this code here removed wasn't needed after all? It's nice to see it all go away (less to maintain / less complexity)
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94243010

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -210,85 +199,41 @@ private void finish() {
          */
          private void finish(int maxDeterminizedStates) {
          Automaton automaton = builder.finish();
          -

          • // System.out.println("before det:\n" + automaton.toDot());
            -
          • Transition t = new Transition();
            -
          • // TODO: should we add "eps back to initial node" for all states,
          • // and det that? then we don't need to revisit initial node at
          • // every position? but automaton could blow up? And, this makes it
          • // harder to skip useless positions at search time?
            -
          • if (anyTermID != -1) {
            -
          • // Make sure there are no leading or trailing ANY:
          • int count = automaton.initTransition(0, t);
          • for (int i = 0; i < count; i++) {
          • automaton.getNextTransition(t);
          • if (anyTermID >= t.min && anyTermID <= t.max) { - throw new IllegalStateException("automaton cannot lead with an ANY transition"); - }
          • }
            -
          • int numStates = automaton.getNumStates();
          • for (int i = 0; i < numStates; i++) {
          • count = automaton.initTransition(i, t);
          • for (int j = 0; j < count; j++) {
          • automaton.getNextTransition(t);
          • if (automaton.isAccept(t.dest) && anyTermID >= t.min && anyTermID <= t.max) { - throw new IllegalStateException("automaton cannot end with an ANY transition"); - }
          • }
          • }
            -
          • int termCount = termToID.size();
            -
          • // We have to carefully translate these transitions so automaton
          • // realizes they also match all other terms:
          • Automaton newAutomaton = new Automaton();
          • for (int i = 0; i < numStates; i++) { - newAutomaton.createState(); - newAutomaton.setAccept(i, automaton.isAccept(i)); - }

            -

          • for (int i = 0; i < numStates; i++) {
          • count = automaton.initTransition(i, t);
          • for (int j = 0; j < count; j++) {
          • automaton.getNextTransition(t);
          • int min, max;
          • if (t.min <= anyTermID && anyTermID <= t.max) { - // Match any term - min = 0; - max = termCount - 1; - }

            else

            { - min = t.min; - max = t.max; - }
          • newAutomaton.addTransition(t.source, t.dest, min, max);
          • }
          • }
          • newAutomaton.finishState();
          • automaton = newAutomaton;
          • }
            -
            det = Operations.removeDeadStates(Operations.determinize(automaton, maxDeterminizedStates));
            }
          • private int getTermID(BytesRef term) {
          • Integer id = termToID.get(term);
          • if (id == null) {
          • id = termToID.size();
          • if (term != null) { - term = BytesRef.deepCopyOf(term); - }
          • termToID.put(term, id);
            + /**
            + * Gets an integer id for a given term.
            + *
            + * If there is no position gaps for this token then we can reuse the id for the same term if it appeared at another
            + * position without a gap. If we have a position gap generate a new id so we can keep track of the position
            + * increment.
            + */
            + private int getTermID(int incr, int prevIncr, BytesRef term) {
            + assert term != null;
            + boolean isStackedGap = incr == 0 && prevIncr > 1;
            + boolean hasGap = incr > 1;
            + term = BytesRef.deepCopyOf(term);
              • End diff –

          The deepCopyOf is only needed if you generate a new ID, not for an existing one.

          BTW... have you seen BytesRefHash? I think re-using that could minimize the code here to deal with this stuff.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94243010 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -210,85 +199,41 @@ private void finish() { */ private void finish(int maxDeterminizedStates) { Automaton automaton = builder.finish(); - // System.out.println("before det:\n" + automaton.toDot()); - Transition t = new Transition(); - // TODO: should we add "eps back to initial node" for all states, // and det that? then we don't need to revisit initial node at // every position? but automaton could blow up? And, this makes it // harder to skip useless positions at search time? - if (anyTermID != -1) { - // Make sure there are no leading or trailing ANY: int count = automaton.initTransition(0, t); for (int i = 0; i < count; i++) { automaton.getNextTransition(t); if (anyTermID >= t.min && anyTermID <= t.max) { - throw new IllegalStateException("automaton cannot lead with an ANY transition"); - } } - int numStates = automaton.getNumStates(); for (int i = 0; i < numStates; i++) { count = automaton.initTransition(i, t); for (int j = 0; j < count; j++) { automaton.getNextTransition(t); if (automaton.isAccept(t.dest) && anyTermID >= t.min && anyTermID <= t.max) { - throw new IllegalStateException("automaton cannot end with an ANY transition"); - } } } - int termCount = termToID.size(); - // We have to carefully translate these transitions so automaton // realizes they also match all other terms: Automaton newAutomaton = new Automaton(); for (int i = 0; i < numStates; i++) { - newAutomaton.createState(); - newAutomaton.setAccept(i, automaton.isAccept(i)); - } - for (int i = 0; i < numStates; i++) { count = automaton.initTransition(i, t); for (int j = 0; j < count; j++) { automaton.getNextTransition(t); int min, max; if (t.min <= anyTermID && anyTermID <= t.max) { - // Match any term - min = 0; - max = termCount - 1; - } else { - min = t.min; - max = t.max; - } newAutomaton.addTransition(t.source, t.dest, min, max); } } newAutomaton.finishState(); automaton = newAutomaton; } - det = Operations.removeDeadStates(Operations.determinize(automaton, maxDeterminizedStates)); } private int getTermID(BytesRef term) { Integer id = termToID.get(term); if (id == null) { id = termToID.size(); if (term != null) { - term = BytesRef.deepCopyOf(term); - } termToID.put(term, id); + /** + * Gets an integer id for a given term. + * + * If there is no position gaps for this token then we can reuse the id for the same term if it appeared at another + * position without a gap. If we have a position gap generate a new id so we can keep track of the position + * increment. + */ + private int getTermID(int incr, int prevIncr, BytesRef term) { + assert term != null; + boolean isStackedGap = incr == 0 && prevIncr > 1; + boolean hasGap = incr > 1; + term = BytesRef.deepCopyOf(term); End diff – The deepCopyOf is only needed if you generate a new ID, not for an existing one. BTW... have you seen BytesRefHash? I think re-using that could minimize the code here to deal with this stuff.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94244009

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -80,22 +77,41 @@ public boolean incrementToken() throws IOException {
          }
          }

          + private GraphTokenStreamFiniteStrings()

          { + this.builder = new Automaton.Builder(); + }

          +
          /**

          • Gets the list of finite string token streams from the given input graph token stream.
            */
          • public List<TokenStream> getTokenStreams(final TokenStream in) throws IOException {
          • // build automation
            + public static List<TokenStream> getTokenStreams(final TokenStream in) throws IOException { + GraphTokenStreamFiniteStrings gfs = new GraphTokenStreamFiniteStrings(); + return gfs.process(in); + }

            +
            + /**
            + * Builds automaton and builds the finite string token streams.
            + */
            + private List<TokenStream> process(final TokenStream in) throws IOException {
            build(in);

          List<TokenStream> tokenStreams = new ArrayList<>();
          final FiniteStringsIterator finiteStrings = new FiniteStringsIterator(det);
          for (IntsRef string; (string = finiteStrings.next()) != null; ) {
          final BytesRef[] tokens = new BytesRef[string.length];
          — End diff –

          Hmm; rather than materializing an array of tokens and increments, maybe you could simply give the IntsRefString to BytesRefArrayTokenStream (and make BRATS not static) so that it could do this on the fly? Not a big deal either way (current or my proposal). If you do as I suggest then BRATS would no longer be a suitable name; maybe simply FiniteStringTokenStream or CustomTokenStream.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94244009 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -80,22 +77,41 @@ public boolean incrementToken() throws IOException { } } + private GraphTokenStreamFiniteStrings() { + this.builder = new Automaton.Builder(); + } + /** Gets the list of finite string token streams from the given input graph token stream. */ public List<TokenStream> getTokenStreams(final TokenStream in) throws IOException { // build automation + public static List<TokenStream> getTokenStreams(final TokenStream in) throws IOException { + GraphTokenStreamFiniteStrings gfs = new GraphTokenStreamFiniteStrings(); + return gfs.process(in); + } + + /** + * Builds automaton and builds the finite string token streams. + */ + private List<TokenStream> process(final TokenStream in) throws IOException { build(in); List<TokenStream> tokenStreams = new ArrayList<>(); final FiniteStringsIterator finiteStrings = new FiniteStringsIterator(det); for (IntsRef string; (string = finiteStrings.next()) != null; ) { final BytesRef[] tokens = new BytesRef [string.length] ; — End diff – Hmm; rather than materializing an array of tokens and increments, maybe you could simply give the IntsRefString to BytesRefArrayTokenStream (and make BRATS not static) so that it could do this on the fly? Not a big deal either way (current or my proposal). If you do as I suggest then BRATS would no longer be a suitable name; maybe simply FiniteStringTokenStream or CustomTokenStream.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/129#discussion_r94241922

          — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java —
          @@ -80,22 +77,41 @@ public boolean incrementToken() throws IOException {
          }
          }

          + private GraphTokenStreamFiniteStrings() {
          + this.builder = new Automaton.Builder();
          — End diff –

          The other fields are initialized at the declaration; might as well move this here too?

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/129#discussion_r94241922 — Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java — @@ -80,22 +77,41 @@ public boolean incrementToken() throws IOException { } } + private GraphTokenStreamFiniteStrings() { + this.builder = new Automaton.Builder(); — End diff – The other fields are initialized at the declaration; might as well move this here too?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattweber commented on the issue:

          https://github.com/apache/lucene-solr/pull/129

          Thanks @dsmiley! I have just pushed up code with your suggestions except for using `BytesRefHash` due to the fact we might have the same `BytesRef` but need a different id because we have position gap.

          This has been great, love the feedback!

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattweber commented on the issue: https://github.com/apache/lucene-solr/pull/129 Thanks @dsmiley! I have just pushed up code with your suggestions except for using `BytesRefHash` due to the fact we might have the same `BytesRef` but need a different id because we have position gap. This has been great, love the feedback!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikemccand commented on the issue:

          https://github.com/apache/lucene-solr/pull/129

          This change looks great to me! What an awesome improvement, to properly use graph token streams at search time so multi-token synonyms are correct.

          I'll push this in a few days once I'm back home unless someone pushes first (@dsmiley feel free)...

          Thank you @mattweber!

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikemccand commented on the issue: https://github.com/apache/lucene-solr/pull/129 This change looks great to me! What an awesome improvement, to properly use graph token streams at search time so multi-token synonyms are correct. I'll push this in a few days once I'm back home unless someone pushes first (@dsmiley feel free)... Thank you @mattweber!
          Hide
          mattweber Matt Weber added a comment -

          Great thank you both! I have updated the brach_6x PR with the latest changes as well as rebased+squashed both. Happy New Year!

          Show
          mattweber Matt Weber added a comment - Great thank you both! I have updated the brach_6x PR with the latest changes as well as rebased+squashed both. Happy New Year!
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1bcf9a251d597cdc029295325b287ce5ce661bec in lucene-solr's branch refs/heads/master from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1bcf9a2 ]

          LUCENE-7603: add CHANGES entry

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1bcf9a251d597cdc029295325b287ce5ce661bec in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1bcf9a2 ] LUCENE-7603 : add CHANGES entry
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikemccand commented on the issue:

          https://github.com/apache/lucene-solr/pull/129

          I've merged this into Lucene's master (7.0), and I'm working on 6.x (#130) now. Thanks @mattweber! Can you close this?

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikemccand commented on the issue: https://github.com/apache/lucene-solr/pull/129 I've merged this into Lucene's master (7.0), and I'm working on 6.x (#130) now. Thanks @mattweber! Can you close this?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 018df31da8b6b5beeb767c90d7ef2a784eca354a in lucene-solr's branch refs/heads/master from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=018df31 ]

          LUCENE-7603: add package-info.java for new package

          Show
          jira-bot ASF subversion and git services added a comment - Commit 018df31da8b6b5beeb767c90d7ef2a784eca354a in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=018df31 ] LUCENE-7603 : add package-info.java for new package
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c980f6a1c2a33a039d09a83cd5b9b95a58fa784f in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c980f6a ]

          LUCENE-7603: handle graph token streams in query parsers

          Show
          jira-bot ASF subversion and git services added a comment - Commit c980f6a1c2a33a039d09a83cd5b9b95a58fa784f in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c980f6a ] LUCENE-7603 : handle graph token streams in query parsers
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikemccand commented on the issue:

          https://github.com/apache/lucene-solr/pull/130

          I've merged this into Lucene's branch_6x; thank you @mattweber! Can you please close this now?

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikemccand commented on the issue: https://github.com/apache/lucene-solr/pull/130 I've merged this into Lucene's branch_6x; thank you @mattweber! Can you please close this now?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/lucene-solr/pull/130

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/lucene-solr/pull/130
          Hide
          mikemccand Michael McCandless added a comment -

          Thank you Matt Weber!

          Show
          mikemccand Michael McCandless added a comment - Thank you Matt Weber !
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mattweber closed the pull request at:

          https://github.com/apache/lucene-solr/pull/129

          Show
          githubbot ASF GitHub Bot added a comment - Github user mattweber closed the pull request at: https://github.com/apache/lucene-solr/pull/129
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user EreMaijala commented on the issue:

          https://github.com/apache/lucene-solr/pull/130

          I believe this has caused https://issues.apache.org/jira/browse/LUCENE-7698.

          Show
          githubbot ASF GitHub Bot added a comment - Github user EreMaijala commented on the issue: https://github.com/apache/lucene-solr/pull/130 I believe this has caused https://issues.apache.org/jira/browse/LUCENE-7698 .

            People

            • Assignee:
              Unassigned
              Reporter:
              mattweber Matt Weber
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development