Details
Description
While attempting a bulk ingest from a mapreduce job, I noticed that after calling importDirectory() I started getting errors in the tservers like the following:
16 11:04:53,337 [file.FileUtil] DEBUG: Too many indexes (31) to open at once for [snip...], reducing in tmpDir = /accumulo/tmp/idxReduce_2009963461
16 11:04:53,595 [tabletserver.TabletServer] ERROR: Unexpected exception in Split/MajC initiator
java.lang.NullPointerException
at org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382)
at org.apache.accumulo.core.file.FileUtil.reduceFiles(FileUtil.java:147)
at org.apache.accumulo.core.file.FileUtil.findMidPoint(FileUtil.java:281)
at org.apache.accumulo.core.file.FileUtil.findMidPoint(FileUtil.java:186)
at org.apache.accumulo.server.tabletserver.Tablet.findSplitRow(Tablet.java:2939)
at org.apache.accumulo.server.tabletserver.Tablet.needsSplit(Tablet.java:3013)
at org.apache.accumulo.server.tabletserver.TabletServer$MajorCompactor.run(TabletServer.java:2066)
at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
at java.lang.Thread.run(Thread.java:619)
As a result, my data was never showing up in the tables. I poked around in RFile.java, and noticed
that the null pointer was the currentLocalityGroup. To get past this, I threw in a call to
startDefaultLocalityGroup() if currentLocalityGroup is null (in RFile.append()).
This then lead to the following error
6 15:15:46,989 [file.FileUtil] DEBUG: Too many indexes (40) to open at once for 10.252.
158.124 10.251.213.245:537, reducing in tmpDir = /accumulo/tmp/idxReduce_193905614116 15:15:48,060 [file.FileUtil] DEBUG: Finished reducing indexes for 10.252.158.124 10.2
51.213.245:537 in 1.07 secs16 15:15:48,068 [tabletserver.TabletServer] ERROR: Unexpected exception in Split/MajC initiator
java.lang.IllegalArgumentException: File name rf_0000 has no extension
at org.apache.accumulo.core.file.DispatchingFileFactory.findFileFactory(FileOperations.java:51)
at org.apache.accumulo.core.file.DispatchingFileFactory.openIndex(FileOperations.java:67)
at org.apache.accumulo.core.file.FileUtil.countIndexEntries(FileUtil.java:392)
at org.apache.accumulo.core.file.FileUtil.findMidPoint(FileUtil.java:294)
at org.apache.accumulo.core.file.FileUtil.findMidPoint(FileUtil.java:186)
at org.apache.accumulo.server.tabletserver.Tablet.findSplitRow(Tablet.java:2939)
at org.apache.accumulo.server.tabletserver.Tablet.needsSplit(Tablet.java:3013)
at org.apache.accumulo.server.tabletserver.TabletServer$MajorCompactor.run(TabletServer.java:2066)
at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
at java.lang.Thread.run(Thread.java:619)
To get past this one, I threw a ".rf" extension on the file being opened
(outFile in FileUtil.reduceFiles()), and I also changed the add call
immediately after from outFiles.add(newMapFile) to outFiles.add(outFile).
Now my bulk imports work again. Don't know why this happens, and this
surely isn't the proper way to fix the problem, but thought I'd let you
know.