Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1625

lucene2seq: failure to convert a document that does not contain a field (the field is not required)

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When trying to convert a lucene index in which not all fields are required (and therefore in some documents the field does not exist) the following exception is thrown:

      java.lang.IllegalArgumentException: Field 'MISSING_FIELDNAME' does not exist in the index
      at org.apache.mahout.text.LuceneIndexHelper.fieldShouldExistInIndex(LuceneIndexHelper.java:36)
      at org.apache.mahout.text.LuceneSegmentRecordReader.initialize(LuceneSegmentRecordReader.java:63)
      at org.apache.mahout.text.LuceneSegmentInputFormat.createRecordReader(LuceneSegmentInputFormat.java:76)
      at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

      It would be good to either ignore missing field values by default or to have an additional parameter that turns ignoring them on or off.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            frankscholten Frank Scholten
            TomAL Tom Lampert
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment