Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8476

Optimizations in UserDictionary (KoreanAnalyzer)

    XMLWordPrintableJSON

    Details

    • Lucene Fields:
      New

      Description

      ■ Bug fix

      1) BufferedReader's close method is not called.  (Wrong check)

      // Line 57 method
      public static UserDictionary open(Reader reader) throws IOException {
      
        BufferedReader br = new BufferedReader(reader);
        String line = null;
        List<String> entries = new ArrayList<>();
      
        // text + optional segmentations
        while ((line = br.readLine()) != null) {
          ...
        }
      
        if (entries.isEmpty()) {
          return null;
        } else {
          return new UserDictionary(entries);
        }
      }

      If you look at the code above, there is no close() method for the "br" variable.
      As I know, BufferedReader can cause a memory leak if the close method is not called.
      So I changed the code below.

      // Line 57 method
      public static UserDictionary open(Reader reader) throws IOException {
      
        String line = null;
        List<String> entries = new ArrayList<>();
      
        // text + optional segmentations
        try (BufferedReader br = new BufferedReader(reader)) {
          while ((line = br.readLine()) != null) {
            ...
          }
        }
      
        if (entries.isEmpty()) {
          return null;
        } else {
          return new UserDictionary(entries);
        }
      }
      

      I solved this problem with "try-with-resources" method available since Java 7.

       

      ■ Optimizations

      1) Change from Collections.sort to List.sort (UserDictionary constructor)

      // Line 82 method
      private UserDictionary(List<String> entries) throws IOException {
        final CharacterDefinition charDef = CharacterDefinition.getInstance();
        Collections.sort(entries,
            Comparator.comparing(e -> e.split("\\s+")[0]));
      
        PositiveIntOutputs fstOutput = PositiveIntOutputs.getSingleton();
        ...
      }

      List.sort in Java 8 is known to be faster than existing Collections.sort. (http://ankitsambyal.blogspot.com/2014/03/difference-between-listsort-and.html) So I changed the code below.

      // Line 82 method
      private UserDictionary(List<String> entries) throws IOException {
        final CharacterDefinition charDef = CharacterDefinition.getInstance();
        entries.sort(Comparator.comparing(e -> e.split("\\s+")[0]));
      
        PositiveIntOutputs fstOutput = PositiveIntOutputs.getSingleton();
        ...
      }

       

      2) Remove unnecessary null check (UserDictionary constructor)

      // Line 82 method
      private UserDictionary(List<String> entries) throws IOException {
        ...
        String lastToken = null;
        ...
        for (String entry : entries) {
          String[] splits = entry.split("\\s+");
          String token = splits[0];
          if (lastToken != null && token.equals(lastToken)) {
            continue;
          }
          char lastChar = entry.charAt(entry.length()-1);
        ...
      }

      Looking at this part of the code,

      if (lastToken != null && token.equals(lastToken)) {
        continue;
      }

      A null check for lastToken is unnecessary.
      Because the equals method of the String class internally performs a null check.
      So I changed the code as below.

      // Line 82 method
      private UserDictionary(List<String> entries) throws IOException {
        ...
        String lastToken = null;
        ...
        for (String entry : entries) {
          String[] splits = entry.split("\\s+");
          String token = splits[0];
          if (token.equals(lastToken)) {
            continue;
          }
          char lastChar = entry.charAt(entry.length()-1);
        ...
      }

        Attachments

        1. LUCENE-8476.patch
          2 kB
          Namgyu Kim
        2. LUCENE-8476.patch
          3 kB
          Namgyu Kim

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              danmuzi Namgyu Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: