Description
The batch id passed to GeneratorJob by option/argument -batchId <id> is ignored and a generated batch id is used to mark the current batch. Log snippets from a run of bin/crawl:
bin/nutch generate ... -batchId 1444941073-14208 ... GeneratorJob: generated batch id: 1444941074-858443668 containing 1 URLs Fetching : bin/nutch fetch ... 1444941073-14208 ... ... QueueFeeder finished: total 0 records. Hit by time limit :0
The generated URLs are marked with the wrong batch id:
hbase(main):010:0> scan 'test_webpage' ROW COLUMN+CELL org.apache.nutch:http/ column=f:bid, timestamp=1444941077080, value=1444941074-858443668 ... org.apache.nutch:http/ column=mk:_gnmrk_, timestamp=1444941077080, value=1444941074-858443668
and fetcher will not fetch anything. This problem was reported by Sherban Drulea [1, [2.