Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1872

Backport tika-langdetect from 2.x branch to 1.13 branch

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.13
    • Component/s: languageidentifier
    • Labels:

      Description

      Backporting tika-langdetect from 2.x branch to 1.13 branch to improve accuracy of language detection.

        Issue Links

          Activity

          Hide
          chrismattmann Chris A. Mattmann added a comment -

          This is now done, Ken's Optimaize langdetect, N-gram langdetect and Text.jl from MIT are all now integrated:

          LMC-053601:tika1.13 mattmann$ git commit -m "Resolve conflicts in CHANGES.txt"
          [master 2caf3da] Resolve conflicts in CHANGES.txt
          LMC-053601:tika1.13 mattmann$ git push -u origin master
          Counting objects: 477, done.
          Delta compression using up to 8 threads.
          Compressing objects: 100% (237/237), done.
          Writing objects: 100% (477/477), 113.91 KiB | 0 bytes/s, done.
          Total 477 (delta 134), reused 320 (delta 67)
          remote: tika git commit: Resolve conflicts in CHANGES.txt
          remote: tika git commit: Update with information about TIKA-1872, TIKA-1696 and TIKA-1723.
          remote: tika git commit: Merge branch 'TIKA-1872'
          remote: tika git commit: Merge branch 'TIKA-1872' of https://github.com/trevorlewis/tika into TIKA-1872
          remote: tika git commit: Updated TextLangDetector and fixed build errors
          remote: tika git commit: Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tika into TIKA-1872
          remote: tika git commit: Depend on 1.13-SNAPSHOT, not 2.0.
          remote: tika git commit: Merge branch 'TIKA-1872' of https://github.com/trevorlewis/tika into TIKA-1872
          remote: tika git commit: Added missing license headers
          remote: tika git commit: Add missing license headers
          remote: tika git commit: fix for TIKA-1872 contributed by trevorlewis
          remote: tika git commit: Make detector "discoverable", use that everywhere
          remote: tika git commit: Move base lang detect classes to core
          remote: tika git commit: Remove built-in lang detector
          remote: tika git commit: Add tika-langdetect dependency in other modules
          remote: tika git commit: Add project.build.sourceEncoding to properties
          remote: tika git commit: Roll in new lang detect support in new module
          remote: tika git commit: Add missing dependency on tika-test-resources
          To https://git-wip-us.apache.org/repos/asf/tika.git
             c9d508d..2caf3da  master -> master
          Branch master set up to track remote
          

          Thanks Ken Krugler and Trevor Lewis!

          Show
          chrismattmann Chris A. Mattmann added a comment - This is now done, Ken's Optimaize langdetect, N-gram langdetect and Text.jl from MIT are all now integrated: LMC-053601:tika1.13 mattmann$ git commit -m "Resolve conflicts in CHANGES.txt" [master 2caf3da] Resolve conflicts in CHANGES.txt LMC-053601:tika1.13 mattmann$ git push -u origin master Counting objects: 477, done. Delta compression using up to 8 threads. Compressing objects: 100% (237/237), done. Writing objects: 100% (477/477), 113.91 KiB | 0 bytes/s, done. Total 477 (delta 134), reused 320 (delta 67) remote: tika git commit: Resolve conflicts in CHANGES.txt remote: tika git commit: Update with information about TIKA-1872, TIKA-1696 and TIKA-1723. remote: tika git commit: Merge branch 'TIKA-1872' remote: tika git commit: Merge branch 'TIKA-1872' of https://github.com/trevorlewis/tika into TIKA-1872 remote: tika git commit: Updated TextLangDetector and fixed build errors remote: tika git commit: Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tika into TIKA-1872 remote: tika git commit: Depend on 1.13-SNAPSHOT, not 2.0. remote: tika git commit: Merge branch 'TIKA-1872' of https://github.com/trevorlewis/tika into TIKA-1872 remote: tika git commit: Added missing license headers remote: tika git commit: Add missing license headers remote: tika git commit: fix for TIKA-1872 contributed by trevorlewis remote: tika git commit: Make detector "discoverable", use that everywhere remote: tika git commit: Move base lang detect classes to core remote: tika git commit: Remove built-in lang detector remote: tika git commit: Add tika-langdetect dependency in other modules remote: tika git commit: Add project.build.sourceEncoding to properties remote: tika git commit: Roll in new lang detect support in new module remote: tika git commit: Add missing dependency on tika-test-resources To https://git-wip-us.apache.org/repos/asf/tika.git c9d508d..2caf3da master -> master Branch master set up to track remote Thanks Ken Krugler and Trevor Lewis !

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              lewistre Trevor Lewis
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development