Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-946

cache.jsp does not recognize encoding conversion from content different to UTF-8

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 1.2
    • None
    • web gui
    • None
    • Patch Available

    Description

      Cache view does not recognize encoding conversion needed to show properly page content stored in a segment.

      The problem is that it searchs "CharEncodingForConversion" meta in content metadata, but it's stored in parse metadata.

      Here is the patch I've generated for the fixed version:

          1. Eclipse Workspace Patch 1.0
            #P branch-1.2
            Index: src/web/jsp/cached.jsp
            ===================================================================
          • src/web/jsp/cached.jsp (revision 1027060)
            +++ src/web/jsp/cached.jsp (working copy)
            @@ -39,17 +39,18 @@
            ResourceBundle.getBundle("org.nutch.jsp.cached", request.getLocale())
            .getLocale().getLanguage();
      • Metadata metaData = bean.getParseData(details).getContentMeta();
        + Metadata contentMetaData = bean.getParseData(details).getContentMeta();
        + Metadata parseMetaData = bean.getParseData(details).getParseMeta();

      String content = null;

      • String contentType = (String) metaData.get(Metadata.CONTENT_TYPE);
        + String contentType = (String) contentMetaData.get(Metadata.CONTENT_TYPE);
        if (contentType.startsWith("text/html")) {
        // FIXME : it's better to emit the original 'byte' sequence
        // with 'charset' set to the value of 'CharEncoding',
        // but I don't know how to emit 'byte sequence' in JSP.
        // out.getOutputStream().write(bean.getContent(details)) may work,
        // but I'm not sure.
      • String encoding = (String) metaData.get("CharEncodingForConversion");
        + String encoding = (String) parseMetaData.get("CharEncodingForConversion");
        if (encoding != null) {
        try {
        content = new String(bean.getContent(details), encoding);

      Attachments

        1. cache-946.patch
          1 kB
          Enrique Berlanga

        Activity

          People

            Unassigned Unassigned
            eberlanga Enrique Berlanga
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: