Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7255

Index Corruption on HDFS whenever online bulk indexing (from Hive)

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: 4.10.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      HDP 2.2 / HDP Search + LucidWorks hadoop-lws-job.jar

      Description

      When running SolrCloud on HDFS and using the LucidWorks hadoop-lws-job.jar to index a Hive table (620M rows) to Solr it runs for about 1500 secs and then gets this exception:

      Exception in thread "Lucene Merge Thread #2191" org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: codec header mismatch: actual header=1494817490 vs expected header=1071082519 (resource: BufferedChecksumIndexInput(_r3.nvm))
              at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:549)
              at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:522)
      Caused by: org.apache.lucene.index.CorruptIndexException: codec header mismatch: actual header=1494817490 vs expected header=1071082519 (resource: BufferedChecksumIndexInput(_r3.nvm))
              at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:136)
              at org.apache.lucene.codecs.lucene49.Lucene49NormsProducer.<init>(Lucene49NormsProducer.java:75)
              at org.apache.lucene.codecs.lucene49.Lucene49NormsFormat.normsProducer(Lucene49NormsFormat.java:112)
              at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:127)
              at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:108)
              at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
              at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:282)
              at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3951)
              at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3913)
              at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3766)
              at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
              at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
      

      So I deleted the whole index, re-create it and re-ran the job to send Hive table contents to Solr again and it returned exactly the same exception the first time after trying to send a lot of updates to Solr.

      I moved off HDFS to a normal dataDir backend and then re-indexed the full table in 2 hours successfully without index corruptions.

      This implies that this is some sort of stability issue on the HDFS DirectoryFactory implementation.

      Regards,

      Hari Sekhon
      http://www.linkedin.com/in/harisekhon

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              harisekhon Hari Sekhon

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment