Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.2
-
None
-
nutch 2.1 / cassandra 1.2.1 / gora-cassandra 0.2 / gora-core 0.2.1
running fetch with parse=true
fetcher.threads.per.queue=2
nutch on a 16 core AMD Opteron 2GHz
Cassandra on 8 core Intel Xeon 3.3 GHz
Description
This is the result of debugging one of my issues described in NUTCH-1534.
example trace:
java.lang.NullPointerException
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:71)
at org.apache.gora.cassandra.store.CassandraClient.addColumn(CassandraClient.java:139)
at org.apache.gora.cassandra.store.CassandraStore.addOrUpdateField(CassandraStore.java:307)
at org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:212)
at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.nutch.fetcher.FetcherReducer$FetcherThread.output(FetcherReducer.java:664)
at org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:534)
I'm suspecting CassandraStore.put() not taking enough precautions to copy all objects safely to it's buffer.
switch(type) { case RECORD: Persistent persistent = (Persistent) fieldValue; Persistent newRecord = persistent.newInstance(new StateManagerImpl()); for (Field member: fieldSchema.getFields()) { newRecord.put(member.pos(), persistent.get(member.pos())); } fieldValue = newRecord; break; case MAP: StatefulHashMap<?, ?> map = (StatefulHashMap<?, ?>) fieldValue; StatefulHashMap<?, ?> newMap = new StatefulHashMap(map); fieldValue = newMap; break; }
case RECORD - do we not need to duplicate the object returned by "persistent.get(member.pos())":
newRecord.put(member.pos(), persistent.get(member.pos()))
case MAP - do we not need to duplicate all value-objects of the map?
I had not time to write a patch or test this, so, please comment
Attachments
Attachments
Issue Links
- is depended upon by
-
NUTCH-1534 cassandra/hector exception: InvalidRequestException(why:column name must not be empty)
- Closed
- relates to
-
GORA-182 Nutch 2.1 does not work with gora-cassandra 0.2.1
- Closed