Solr
  1. Solr
  2. SOLR-4376

dih.last_index_time has bad Date.toString() format during first delta import

    Details

      Description

      Hi

      In:
      org.apache.solr.handler.dataimport.DocBuilder#getVariableResolver

       
            private static final Date EPOCH = new Date(0);
      
            if (persistedProperties.get(LAST_INDEX_TIME) != null) {
              // String added to map
              indexerNamespace.put(LAST_INDEX_TIME, persistedProperties.get(LAST_INDEX_TIME));
            } else  {
              // Date added to map
              indexerNamespace.put(LAST_INDEX_TIME, EPOCH);
            }
      

       

      • When LAST_INDEX_TIME is found in the data-import.properties, the value in the map is a String.
      • When LAST_INDEX_TIME is not found, we use timestamp = 0, but the value is a Date
      • When using full-import it works fine because basically we don't need this LAST_INDEX_TIME.
      • When doing delta import after a full import it also works fine.
      • But when doing a first delta import on a clean configuration, without any data-import.properties present, I have an SQL exception because of this query:
        SELECT xxx 
        FROM BATCH_JOB_EXECUTION yyy 
        WHERE last_updated > 'Thu Jan 01 01:00:00 CET 1970'
        

         

      While normally the query is:

      SELECT xxx 
      FROM BATCH_JOB_EXECUTION yyy 
      WHERE last_updated > '1970-01-01 01:00:00'
      

       

      For a configured query being:

      deltaQuery="SELECT bje.job_execution_id as JOB_EXECUTION_ID
      FROM BATCH_JOB_EXECUTION bje
      WHERE last_updated > '${dih.last_index_time}'"
      

       

      I think in any case, the value associated to the key in the map must be consistent and either be String or Date, but not both.

      Personally I would expect it to be stored as String, and the EPOCH date being formatted in the exact same format the date properties are persisted in the file, which is:
      org.apache.solr.handler.dataimport.SimplePropertiesWriter#dateFormat

      This doesn't have a real impact on our code but it is just that an integration test "test_delta_import_when_never_indexed" was unexpectedly failing while all others were ok, after a Solr 1.4 to Solr 4.1 migration.
      Thus it seems to be a minor regression.

      Thanks

      1. SOLR-4376-trunk.patch
        4 kB
        Arcadius Ahouansou

        Issue Links

          Activity

          Hide
          James Dyer added a comment -

          could explain what is happening to your dataimport.properties file and how it is wrong?

          Show
          James Dyer added a comment - could explain what is happening to your dataimport.properties file and how it is wrong?
          Hide
          James Dyer added a comment -

          now that I've read your edited description, I think I get it. If you run a delta even though a full import has never been run before, it trips up, right?

          Show
          James Dyer added a comment - now that I've read your edited description, I think I get it. If you run a delta even though a full import has never been run before, it trips up, right?
          Hide
          Sebastien Lorber added a comment - - edited

          Exactly.

          Basically all these integration tests worked with Solr 1.4, but with Solr 4.1, the "test_delta_import_when_never_indexed" had to be disabled.
          http://pastebin.com/PDnHv2M2

          The content of the dataimport.properties file itself is not wrong: it does contain the dates in the good format. The matter is that when the file doesn't exist yet, or doesn't contain all regular entries, when requesting a LAST_INDEX_TIME from this file, there is a fallback to the EPOCH date for bootstrapping, but the date is added to the map as Date and not as a formatted String.

          The fix could simply be:

          indexerNamespace.put(LAST_INDEX_TIME, getSimplePropertiesWriterDateFormat().format(EPOCH) );
          

          So that both the persisted date retrieval and a date retrieval miss with fallback to timestamp=0 will provide the same kind of formatted date to the indexerNamespace map

          Show
          Sebastien Lorber added a comment - - edited Exactly. Basically all these integration tests worked with Solr 1.4, but with Solr 4.1, the "test_delta_import_when_never_indexed" had to be disabled. http://pastebin.com/PDnHv2M2 The content of the dataimport.properties file itself is not wrong: it does contain the dates in the good format. The matter is that when the file doesn't exist yet, or doesn't contain all regular entries, when requesting a LAST_INDEX_TIME from this file, there is a fallback to the EPOCH date for bootstrapping, but the date is added to the map as Date and not as a formatted String. The fix could simply be: indexerNamespace.put(LAST_INDEX_TIME, getSimplePropertiesWriterDateFormat().format(EPOCH) ); So that both the persisted date retrieval and a date retrieval miss with fallback to timestamp=0 will provide the same kind of formatted date to the indexerNamespace map
          Hide
          Sebastien Lorber added a comment -

          By the way, the date format you set in the SimplePropertiesWriter is the dateformat you get when you use $

          {dih.last_index_time}

          in your dataimport.xml
          It is not obvious and it could be a good idea to decorelate these 2 date formats. However it's not a big deal...

          Personally instead of build the query from strings, I think it would be more elegant to:

          • Store all dates (even persisted one) as Dates in the indexerNamespace map
          • When creating the delta query, create a PreparedStatement, replacing the variables by real JDBC parameters so that delta query would be:
            SELECT bje.job_execution_id as JOB_EXECUTION_ID
            FROM BATCH_JOB_EXECUTION bje
            WHERE last_updated > ?
            

             

          • Inject the parameters directly as Date instead of replacing the $ {xxx}

            by strings

          • The JDBC parameters position could be a matter and then Spring has a solution for that -> NamedParameterJdbcTemplate
          Show
          Sebastien Lorber added a comment - By the way, the date format you set in the SimplePropertiesWriter is the dateformat you get when you use $ {dih.last_index_time} in your dataimport.xml It is not obvious and it could be a good idea to decorelate these 2 date formats. However it's not a big deal... Personally instead of build the query from strings, I think it would be more elegant to: Store all dates (even persisted one) as Dates in the indexerNamespace map When creating the delta query, create a PreparedStatement, replacing the variables by real JDBC parameters so that delta query would be: SELECT bje.job_execution_id as JOB_EXECUTION_ID FROM BATCH_JOB_EXECUTION bje WHERE last_updated > ?   Inject the parameters directly as Date instead of replacing the $ {xxx} by strings The JDBC parameters position could be a matter and then Spring has a solution for that -> NamedParameterJdbcTemplate
          Hide
          Arcadius Ahouansou added a comment - - edited

          This is a patch/fix against trunk.
          It also includes a test-case that triggers the issue.

          Thanks.

          Arcadius.

          Show
          Arcadius Ahouansou added a comment - - edited This is a patch/fix against trunk. It also includes a test-case that triggers the issue. Thanks. Arcadius.
          Hide
          ASF subversion and git services added a comment -

          Commit 1544421 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1544421 ]

          SOLR-4376: DataImportHandler uses wrong date format for last_index_time if a delta-import is run first before any full-imports

          Show
          ASF subversion and git services added a comment - Commit 1544421 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1544421 ] SOLR-4376 : DataImportHandler uses wrong date format for last_index_time if a delta-import is run first before any full-imports
          Hide
          ASF subversion and git services added a comment -

          Commit 1544422 from shalin@apache.org in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1544422 ]

          SOLR-4376: DataImportHandler uses wrong date format for last_index_time if a delta-import is run first before any full-imports

          Show
          ASF subversion and git services added a comment - Commit 1544422 from shalin@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1544422 ] SOLR-4376 : DataImportHandler uses wrong date format for last_index_time if a delta-import is run first before any full-imports
          Hide
          Shalin Shekhar Mangar added a comment -

          This is fixed.

          Thanks Sebastien and Arcadius!

          Show
          Shalin Shekhar Mangar added a comment - This is fixed. Thanks Sebastien and Arcadius!

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Sebastien Lorber
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development