Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-647

Two small bugs in seq2sparse

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.4
    • 0.5
    • classic
    • None

    Description

      From Vasil on the mailing list:

      1. the minLLR parameter is not taken into account. The problem is that in
      the CollocDriver class
      Job job = new Job(conf);

      is executed before

      conf.setFloat(LLRReducer.MIN_LLR, minLLRValue);

      see CollocDriver.computeNGramsPruneByLLR method

      2. maxDFPercent is not taken into account. The problem is that in
      TFIDFPartialVectorReducer.reduce the check is

      if (df / vectorCount > maxDfPercent) {
      if (log.isInfoEnabled()) {
      log.info("ommiting {}", e.index());
      }
      continue;
      }

      and should be:

      if (df*100 / vectorCount > maxDfPercent) {
      if (log.isInfoEnabled()) {
      log.info("ommiting {}", e.index());
      }
      continue;
      }

      Attachments

        Activity

          People

            srowen Sean R. Owen
            vavasilev Vasil Vasilev
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: