[NUTCH-1978] solrindex will fail when indexing corrupted segments - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Duplicate
Affects Version/s: 1.10
Fix Version/s: 1.10
Component/s: indexer
Labels:
None

Description

The same issue from ~~NUTCH-1771~~ but seems like this bug will appear in most of the versions since they all don't have the code to handle the corrupted segments.

Form ~~NUTCH-1771~~, people pointed out that it will be very hard to handle this in the hadoop layer, and the program should skip the corrupted segments instead of end the program. By corrupted segments I mean that the segment may be just generated and doesn't have the content.

So my initial idea is to check if the segment folder is valid before putting the segment into the hadoop job. If the segment is not valid, we can simply just skip that segment. We can check if the segment folder contains exactly 6 sub directories as there should be. The other approach will be to check all the six sub directories and see if they are exactly the six dir that should appear.

Attachments

Issue Links

duplicates

NUTCH-1771 Solrindex fails if a segment is corrupted or incomplete

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Chong Li

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 31/Mar/15 06:11

Updated:: 13/Mar/24 14:50

Resolved:: 31/Mar/15 20:36