Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1243

Dictionary file format in Lucene-Mahout integration is not in SequenceFileFormat

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.7
    • Fix Version/s: 0.8
    • Component/s: Integration
    • Labels:
      None

      Description

      Dictionary file format generated from lucene.vectors is not in SequenceFileFormat and hence not acceptable as input to CVB clustering.

      The problem code from Driver.java

          File dictOutFile = new File(dictOut);
          log.info("Dictionary Output file: {}", dictOutFile);
          Writer writer = Files.newWriter(dictOutFile, Charsets.UTF_8);
          DelimitedTermInfoWriter tiWriter = new DelimitedTermInfoWriter(writer, delimiter, field);
          try {
            tiWriter.write(termInfo);
          } finally {
            Closeables.close(tiWriter, false);
          }
      
      

        Attachments

          Activity

            People

            • Assignee:
              smarthi Suneel Marthi
              Reporter:
              smarthi Suneel Marthi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: