Solr
  1. Solr
  2. SOLR-2907

java.lang.IllegalArgumentException: deltaQuery has no column to resolve to declared primary key pk='ITEM_ID, CATEGORY_ID'

    Details

      Description

      We are using solr for our site and ran into this error in our own schema and I was able to reproduce it using the dataimport example code in the solr project. We do not get this error in SOLR 1.4 only started seeing it as we are working to upgrade to 3.4.0. It fails when delta-importing linked tables.

      Complete trace:
      Nov 18, 2011 5:21:02 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport
      SEVERE: Delta Import Failed
      java.lang.IllegalArgumentException: deltaQuery has no column to resolve to declared primary key pk='ITEM_ID, CATEGORY_ID'
      at org.apache.solr.handler.dataimport.DocBuilder.findMatchingPkColumn(DocBuilder.java:849)
      at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:900)
      at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:879)
      at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:285)
      at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:179)
      at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:390)
      at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:429)
      at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)

      I used this dataConfig from the wiki on the data import:

      <dataConfig>
      <dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:./example-DIH/hsqldb/ex" user="sa" />
      <document>

      <entity name="item" pk="ID"
      query="select * from item"
      deltaImportQuery="select * from item where ID=='$

      {dataimporter.delta.id}

      '"
      deltaQuery="select id from item where last_modified > '$

      {dataimporter.last_index_time}'">

      <entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
      query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'"
      deltaQuery="select ITEM_ID, CATEGORY_ID from item_category where last_modified > '${dataimporter.last_index_time}

      '"
      parentDeltaQuery="select ID from item where ID=$

      {item_category.ITEM_ID}

      ">

      <entity name="category" pk="ID"
      query="select DESCRIPTION as cat from category where ID = '$

      {item_category.CATEGORY_ID}

      '"
      deltaQuery="select ID from category where last_modified > '$

      {dataimporter.last_index_time}

      '"
      parentDeltaQuery="select ITEM_ID, CATEGORY_ID from item_category where CATEGORY_ID=$

      {category.ID}

      "/>
      </entity>
      </entity>

      </document>
      </dataConfig>

      To reproduce use the data config from above and set the dataimport.properties last update times to before the last_modifed date in the example data. I my case I had to set the year to 1969. Then run a delta-import and the exception occurs. Thanks.

        Activity

        Alan Baker created issue -
        Alan Baker made changes -
        Field Original Value New Value
        Component/s contrib - DataImportHandler [ 12312438 ]
        Hide
        Adam Lane added a comment -

        Here is a link to the exact part of the documentation where the example is given on how to delta link tables but unfortunately doesn't work in the latest version:

        https://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command

        Show
        Adam Lane added a comment - Here is a link to the exact part of the documentation where the example is given on how to delta link tables but unfortunately doesn't work in the latest version: https://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command
        Hide
        Adam Lane added a comment -

        Upgraded to 3.5 and confirmed same problem.

        Show
        Adam Lane added a comment - Upgraded to 3.5 and confirmed same problem.
        Hide
        Adam Lane added a comment -

        FYI: Found an alternate way of doing delta here in this thread that is much faster. Please refer to this until the bug is fixed or wiki pages are changed.

        https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3C9F8B39CB3B7C6D4594293EA29CCF438B01702F22@ICQ-MAIL.icq.il.office.aol.com%3E

        Show
        Adam Lane added a comment - FYI: Found an alternate way of doing delta here in this thread that is much faster. Please refer to this until the bug is fixed or wiki pages are changed. https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3C9F8B39CB3B7C6D4594293EA29CCF438B01702F22@ICQ-MAIL.icq.il.office.aol.com%3E
        Hide
        Cory Berg added a comment -

        Hi All,

        Saw this exact issue today in Solr 3.6. The issue is in findMatchingPkColumn. When I debugged this for a similar case, I noticed that the pk passed in during the case below will be the string "ITEM_ID, CATEGORY_ID". However on the line that actually compares the pk to the returned fields, they will of course not match, because the returned keys are "ITEM_ID" and "CATEGORY_ID". Ergo, multiple comma-separated pks, as given in the Solr DIH Wiki, will not work. The fix appears to be to parse out the pks in order to satisfy the comparison, or to munge the actual returned column names so that the match is forced, but this is much uglier. I will attempt a fix for my own purposes you are welcome to it if interested.

        Show
        Cory Berg added a comment - Hi All, Saw this exact issue today in Solr 3.6. The issue is in findMatchingPkColumn. When I debugged this for a similar case, I noticed that the pk passed in during the case below will be the string "ITEM_ID, CATEGORY_ID". However on the line that actually compares the pk to the returned fields, they will of course not match, because the returned keys are "ITEM_ID" and "CATEGORY_ID". Ergo, multiple comma-separated pks, as given in the Solr DIH Wiki, will not work. The fix appears to be to parse out the pks in order to satisfy the comparison, or to munge the actual returned column names so that the match is forced, but this is much uglier. I will attempt a fix for my own purposes you are welcome to it if interested.
        Hide
        Cory Berg added a comment -

        I suppose the other valid point here is - what is the intended design of the "pk" field in this context? It seems unclear in relation to delta queries.

        Show
        Cory Berg added a comment - I suppose the other valid point here is - what is the intended design of the "pk" field in this context? It seems unclear in relation to delta queries.
        Hide
        Hector Hurtarte added a comment -

        Using Solr 4.2.1 and the issue is still there

        Show
        Hector Hurtarte added a comment - Using Solr 4.2.1 and the issue is still there
        Hide
        Aaron Greenspan added a comment -

        I just ran into this in Solr 4.3.0. The error message is extremely confusing. The situation I encountered involved an SQL query where there WAS an "id" field defined in the main "query" query, as well as a "field" with column="id" and name="id" and yet I would keep getting the error...

        deltaQuery has no column to resolve to declared primary key pk='id'

        It turns out that what this really means is that a useless field called "id" (or whatever the primary key is set to) also has to be in the "deltaQuery" query itself, even if you never reference a field called "id" from the deltaQuery (which I don't). I only reference a field called something else, e.g. blahid, from the deltaQuery. Why deltaQuery needs this redundant field when it's apparently never used is beyond me. Or if there is a good reason, this error message should definitely be changed given that this has been an open ticket for two years.

        Show
        Aaron Greenspan added a comment - I just ran into this in Solr 4.3.0. The error message is extremely confusing. The situation I encountered involved an SQL query where there WAS an "id" field defined in the main "query" query, as well as a "field" with column="id" and name="id" and yet I would keep getting the error... deltaQuery has no column to resolve to declared primary key pk='id' It turns out that what this really means is that a useless field called "id" (or whatever the primary key is set to) also has to be in the "deltaQuery" query itself, even if you never reference a field called "id" from the deltaQuery (which I don't). I only reference a field called something else, e.g. blahid, from the deltaQuery. Why deltaQuery needs this redundant field when it's apparently never used is beyond me. Or if there is a good reason, this error message should definitely be changed given that this has been an open ticket for two years.
        Hide
        Mustafa Daşgın added a comment -

        Hi all,

        In Solr 4.6.0 the same problem..

        Show
        Mustafa Daşgın added a comment - Hi all, In Solr 4.6.0 the same problem..
        Hide
        Harsha B V added a comment - - edited

        Hi all,

        I am using Solr 4.8.0 and had faced the same problem but with some trial and error I was able to SOLVE the problem.

        In my schema.xml I have:
        <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
        <field name="name" type="text_general" indexed="true" stored="true" multiValued="false"/>
        <field name="last_modified" type="date" indexed="true" stored="true" multiValued="false"/>
        unique key is set as:
        <uniqueKey>id</uniqueKey>

        In my data-config.xml I have:
        <dataConfig>
        <dataSource name="ds1" type="JdbcDataSource"
        driver="oracle.jdbc.OracleDriver"
        url="jdbc:oracle:thin:@blah.blah"
        user="blah blah"
        password="blah blah"/>

        <script>
        <![CDATA[
        function removeNullDateFields(row) {
        var std_date = row.get('last_modified');

        if (std_date === null || true === std_date.isEmpty() || std_date === '')

        { row.remove('last_modified'); }

        return row;
        }
        ]]>
        </script>

        <document name="search_doc">
        <entity name="search" pk="ID"
        query="Select
        std.studentID,
        std.studentName,
        std.last_modified
        From student std"
        deltaImportQuery="Select
        std.studentID,
        std.studentName,
        std.last_modified
        From student std where studentID='$

        {dataimporter.delta.ID}'"
        deltaQuery="select studentID as ID from student where to_char(last_modified, 'YYYY-MM-DD HH24:MI:SS') > '${dataimporter.last_index_time}'"
        deletedPkQuery="select deleted_id as ID FROM delete_status WHERE to_char(deleted_date, 'YYYY-MM-DD HH24:MI:SS') > '${dataimporter.last_index_time}'"
        dataSource="ds1" transformer="DateFormatTransformer,script:removeNullDateFields">
        <field column="STUDENTID" name="id"/>
        <field column="STUDENTNAME" name="name"/>
        <field column="LAST_MODIFIED" name="last_modified" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd HH:mm:ss"/>
        </entity>
        </document>
        </dataConfig>

        As you can see,
        --> I have used the entity's 'pk' attribute and set the value as "ID" (In capital letters - uniqueKey value)
        --> Use the same "ID" in 'deltaImportQuery' as ${dataimporter.delta.ID}

        --> "ID" must be used as it is in 'deltaQuery' select statement as "select ID from ..."
        (if you different name for ID column in database, then use 'as' keyword in select statement. In my case I had 'studentID' as primary key in student table. So I used it as "select studentID as ID from ..."
        --> The same applies to 'deletedPkQuery'

        At present its working fine for me. Any updation in database is reflected in Solr as well.

        See if this helps. Cheers!!!

        Show
        Harsha B V added a comment - - edited Hi all, I am using Solr 4.8.0 and had faced the same problem but with some trial and error I was able to SOLVE the problem. In my schema.xml I have: <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/> <field name="name" type="text_general" indexed="true" stored="true" multiValued="false"/> <field name="last_modified" type="date" indexed="true" stored="true" multiValued="false"/> unique key is set as: <uniqueKey>id</uniqueKey> In my data-config.xml I have: <dataConfig> <dataSource name="ds1" type="JdbcDataSource" driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@blah.blah" user="blah blah" password="blah blah"/> <script> <![CDATA[ function removeNullDateFields(row) { var std_date = row.get('last_modified'); if (std_date === null || true === std_date.isEmpty() || std_date === '') { row.remove('last_modified'); } return row; } ]]> </script> <document name="search_doc"> <entity name="search" pk="ID" query="Select std.studentID, std.studentName, std.last_modified From student std" deltaImportQuery="Select std.studentID, std.studentName, std.last_modified From student std where studentID='$ {dataimporter.delta.ID}'" deltaQuery="select studentID as ID from student where to_char(last_modified, 'YYYY-MM-DD HH24:MI:SS') > '${dataimporter.last_index_time}'" deletedPkQuery="select deleted_id as ID FROM delete_status WHERE to_char(deleted_date, 'YYYY-MM-DD HH24:MI:SS') > '${dataimporter.last_index_time}'" dataSource="ds1" transformer="DateFormatTransformer,script:removeNullDateFields"> <field column="STUDENTID" name="id"/> <field column="STUDENTNAME" name="name"/> <field column="LAST_MODIFIED" name="last_modified" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd HH:mm:ss"/> </entity> </document> </dataConfig> As you can see, --> I have used the entity's 'pk' attribute and set the value as "ID" (In capital letters - uniqueKey value) --> Use the same "ID" in 'deltaImportQuery' as ${dataimporter.delta.ID} --> "ID" must be used as it is in 'deltaQuery' select statement as "select ID from ..." (if you different name for ID column in database, then use 'as' keyword in select statement. In my case I had 'studentID' as primary key in student table. So I used it as "select studentID as ID from ..." --> The same applies to 'deletedPkQuery' At present its working fine for me. Any updation in database is reflected in Solr as well. See if this helps. Cheers!!!

          People

          • Assignee:
            Unassigned
            Reporter:
            Alan Baker
          • Votes:
            6 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development