Commons Lang
  1. Commons Lang
  2. LANG-220

[lang] Tokenizer Enhancements: reset input string, static CSV/TSV factories

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.2
    • Component/s: None
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

      Description

      Tokenizer is missing the following features:

      1. Reset of the input string to a new value. This would be helpful
      when parsing large files, as you could use the same tokenizer instance
      on each line of the file by reseting the input on the tokenizer:

        Activity

        Mark Thomas made changes -
        Workflow jira [ 12370152 ] Default workflow, editable Closed status [ 12602045 ]
        Henri Yandell made changes -
        Fix Version/s 2.2 [ 12311702 ]
        Henri Yandell made changes -
        Affects Version/s 2.0 Final [ 12311706 ]
        Henri Yandell made changes -
        Assignee Jakarta Commons Developers Mailing List [ commons-dev@jakarta.apache.org ]
        Fix Version/s 2.2 [ 12311686 ]
        Key COM-1095 LANG-220
        Affects Version/s 2.0 Final [ 12311658 ]
        Project Commons [ 12310458 ] Commons Lang [ 12310481 ]
        Component/s Lang [ 12311121 ]
        Henri Yandell made changes -
        Field Original Value New Value
        issue.field.bugzillaimportkey 26699 12341247
        Hide
        Henri Yandell added a comment -

        Thanks. Did a ton of these at the same time so hopefully not too many other
        errors. We had a lot of unversioned issues.

        Show
        Henri Yandell added a comment - Thanks. Did a ton of these at the same time so hopefully not too many other errors. We had a lot of unversioned issues.
        Hide
        Stephen Colebourne added a comment -

        This was committed during the pre-2.1 cycle, but the class wasn't released in
        2.1. Hence the confusion.

        Updated version to 2.2.

        Show
        Stephen Colebourne added a comment - This was committed during the pre-2.1 cycle, but the class wasn't released in 2.1. Hence the confusion. Updated version to 2.2.
        Hide
        Matthew Inger added a comment -

        how is 2.1 release closing this? I still don't see it included anywhere but in
        subversion. It's not in the javadocs published on the site. And is not
        included in the 2.1 release.

        Show
        Matthew Inger added a comment - how is 2.1 release closing this? I still don't see it included anywhere but in subversion. It's not in the javadocs published on the site. And is not included in the 2.1 release.
        Hide
        Henri Yandell added a comment -

        2.1 released, closing.

        Show
        Henri Yandell added a comment - 2.1 released, closing.
        Hide
        Stephen Colebourne added a comment -

        Patch applied with additions/changes/edits.
        Default for tokenizer is now same as StringTokenizer.

        Show
        Stephen Colebourne added a comment - Patch applied with additions/changes/edits. Default for tokenizer is now same as StringTokenizer.
        Hide
        Matthew Inger added a comment -

        Tokenizer is missing the following features (sorry, hit commit by accident)

        1. Reset of the input string to a new value. This would be helpful
        when parsing large files, as you could use the same tokenizer instance
        on each line of the file by reseting the input on the tokenizer, instead
        of creating a new instance for each line:

        while ((line = reader.readLine()) != null)

        { tokenizer.reset(line); tokens = tokenizer.getAllTokens(); }

        2. I have also added static factory methods for Comma Separated and
        Tab Separated values tokenizers. This is accomplished by implementing
        Cloneable, and having private static instances configured for these
        types, and returning clones when an instance is requested:

        Tokenizer csv = Tokenizer.getCSVInstance(input);

        Please see the attached file, created with the command
        diff -u Tokenizer.java

        Show
        Matthew Inger added a comment - Tokenizer is missing the following features (sorry, hit commit by accident) 1. Reset of the input string to a new value. This would be helpful when parsing large files, as you could use the same tokenizer instance on each line of the file by reseting the input on the tokenizer, instead of creating a new instance for each line: while ((line = reader.readLine()) != null) { tokenizer.reset(line); tokens = tokenizer.getAllTokens(); } 2. I have also added static factory methods for Comma Separated and Tab Separated values tokenizers. This is accomplished by implementing Cloneable, and having private static instances configured for these types, and returning clones when an instance is requested: Tokenizer csv = Tokenizer.getCSVInstance(input); Please see the attached file, created with the command diff -u Tokenizer.java
        Hide
        Matthew Inger added a comment -

        Created an attachment (id=10242)
        Diff file to add suggested features.

        Show
        Matthew Inger added a comment - Created an attachment (id=10242) Diff file to add suggested features.
        Matthew Inger created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Matthew Inger
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development