Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1625

lucene2seq: failure to convert a document that does not contain a field (the field is not required)

    XMLWordPrintableJSON

Details

    Description

      When trying to convert a lucene index in which not all fields are required (and therefore in some documents the field does not exist) the following exception is thrown:

      java.lang.IllegalArgumentException: Field 'MISSING_FIELDNAME' does not exist in the index
      at org.apache.mahout.text.LuceneIndexHelper.fieldShouldExistInIndex(LuceneIndexHelper.java:36)
      at org.apache.mahout.text.LuceneSegmentRecordReader.initialize(LuceneSegmentRecordReader.java:63)
      at org.apache.mahout.text.LuceneSegmentInputFormat.createRecordReader(LuceneSegmentInputFormat.java:76)
      at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

      It would be good to either ignore missing field values by default or to have an additional parameter that turns ignoring them on or off.

      Attachments

        Activity

          People

            frankscholten Frank Scholten
            TomAL Tom Lampert
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: