[MAHOUT-1625] lucene2seq: failure to convert a document that does not contain a field (the field is not required) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 0.9
Fix Version/s: 0.11.0
Component/s: classic
Labels:
- LuceneIndexHelper
- easyfix
- legacy
- lucene
- lucene2seq
- mahout
Environment:

CentOS 6.5

Description

When trying to convert a lucene index in which not all fields are required (and therefore in some documents the field does not exist) the following exception is thrown:

java.lang.IllegalArgumentException: Field 'MISSING_FIELDNAME' does not exist in the index
at org.apache.mahout.text.LuceneIndexHelper.fieldShouldExistInIndex(LuceneIndexHelper.java:36)
at org.apache.mahout.text.LuceneSegmentRecordReader.initialize(LuceneSegmentRecordReader.java:63)
at org.apache.mahout.text.LuceneSegmentInputFormat.createRecordReader(LuceneSegmentInputFormat.java:76)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

It would be good to either ignore missing field values by default or to have an additional parameter that turns ignoring them on or off.

Attachments

Activity

People

Assignee:: Frank Scholten

Reporter:: Tom Lampert

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Nov/14 16:44

Updated:: 31/Jan/24 22:14

Resolved:: 05/Aug/15 21:26