[SOLR-1458] Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4
Fix Version/s: 1.4
Component/s: replication (java)
Labels:
None
Environment:

CentOS x64
8GB RAM
Tomcat, running with 7G max memory; memory usage is <2GB, so it's not the problem

Host a: master
Host b: slave

Multiple single core Solr instances, using JNDI.

Java replication

Description

After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing.

Here's what I'm getting in the slave logs, repeatedly for each shard:

 
SEVERE: SnapPull failed 
java.lang.NullPointerException
        at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
        at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
        at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern.

Here's replication.properties on one of the failed slave instances.

cat data/replication.properties 
#Replication details
#Wed Sep 23 19:35:30 PDT 2009
replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
previousCycleTimeInSeconds=0
timesFailed=113
indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
indexReplicatedAt=1253759730020
replicationFailedAt=1253759730020
lastCycleBytesDownloaded=0
timesIndexReplicated=113

and another

cat data/replication.properties 
#Replication details
#Wed Sep 23 18:42:01 PDT 2009
replicationFailedAtList=1253756490034,1253756460169
previousCycleTimeInSeconds=1
timesFailed=2
indexReplicatedAtList=1253756521284,1253756490034,1253756460169
indexReplicatedAt=1253756521284
replicationFailedAt=1253756490034
lastCycleBytesDownloaded=22932293
timesIndexReplicated=3

Some relevant configs:
In solrconfig.xml:

<!-- For docs see http://wiki.apache.org/solr/SolrReplication -->
  <requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="master">
        <str name="enable">${enable.master:false}</str>
        <str name="replicateAfter">optimize</str>
        <str name="backupAfter">optimize</str>
        <str name="commitReserveDuration">00:00:20</str>
    </lst>
    <lst name="slave">
        <str name="enable">${enable.slave:false}</str>

        <!-- url of master, from properties file -->
        <str name="masterUrl">${master.url}</str>

        <!-- how often to check master -->
        <str name="pollInterval">00:00:30</str>
    </lst>
  </requestHandler>

The slave then has this in solrcore.properties:

enable.slave=true
master.url=URLOFMASTER/replication

and the master has

enable.master=true

I'd be glad to provide more details but I'm not sure what else I can do. ~~SOLR-926~~ may be relevant.

Thanks.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-1458.patch
24/Sep/09 03:56
0.7 kB
Noble Paul
SOLR-1458.patch
24/Sep/09 05:50
2 kB
Noble Paul
SOLR-1458.patch
25/Sep/09 04:27
3 kB
Noble Paul
SOLR-1458.patch
25/Sep/09 06:43
5 kB
Noble Paul
SolrDeletionPolicy.patch
25/Sep/09 18:58
3 kB
Yonik Seeley
SolrDeletionPolicy.patch
25/Sep/09 20:04
6 kB
Yonik Seeley
SOLR-1458.patch
26/Sep/09 02:12
11 kB
Yonik Seeley
reserve.patch
05/Oct/09 09:37
2 kB
Noble Paul
SOLR-1458.patch
05/Oct/09 10:39
5 kB
Noble Paul

Issue Links

is related to

SOLR-1475 Java-based replication doesn't properly reserve its commit point during backups

Closed

Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates