Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2278

Handle alpha-2 language codes consistently

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.12
    • 1.21
    • plugin
    • None

    Description

      The language-identifier plugin provides two extraction policies: detect and identify.

      However the two policies handle alpha-2 codes differently:

      • 'identify' strips out the alpha-2 code e.g. if the identified language is 'en-US' then it will inject 'en' in the meta tags
      • 'detect' does not strip out the alpha-2 code e.g. if the detected language is 'en-US' then it will inject 'en-US' in the meta tags

      Any chance we can make this consistent and always strip out the alpha-2 code ?

      Attachments

        1. NUTCH-2278.patch
          0.7 kB
          Fengtan
        2. NUTCH-2278.patch
          4 kB
          Fengtan

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Fengtan Fengtan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: