Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-647

Two small bugs in seq2sparse

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.4
    • Fix Version/s: 0.5
    • Component/s: Integration
    • Labels:
      None

      Description

      From Vasil on the mailing list:

      1. the minLLR parameter is not taken into account. The problem is that in
      the CollocDriver class
      Job job = new Job(conf);

      is executed before

      conf.setFloat(LLRReducer.MIN_LLR, minLLRValue);

      see CollocDriver.computeNGramsPruneByLLR method

      2. maxDFPercent is not taken into account. The problem is that in
      TFIDFPartialVectorReducer.reduce the check is

      if (df / vectorCount > maxDfPercent) {
      if (log.isInfoEnabled()) {
      log.info("ommiting {}", e.index());
      }
      continue;
      }

      and should be:

      if (df*100 / vectorCount > maxDfPercent) {
      if (log.isInfoEnabled()) {
      log.info("ommiting {}", e.index());
      }
      continue;
      }

        Attachments

          Activity

            People

            • Assignee:
              srowen Sean R. Owen
              Reporter:
              vavasilev Vasil Vasilev
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved: