Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Labels:
      None

      Description

      A Transformer implementation for DIH which strip off HTML tags using the Solr class org.apache.solr.analysis.HTMLStripReader
      This is useful in case you don't need this HTML tags anyway.

      1. patch-887.patch
        3 kB
        Ahmed Hammad
      2. SOLR-887.patch
        4 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Ahmed Hammad added a comment -

          If you have any comment about the Filter, please let me know to fix it.

          Show
          Ahmed Hammad added a comment - If you have any comment about the Filter, please let me know to fix it.
          Hide
          Noble Paul added a comment -

          this looks fine. I do not think it needs any change

          Show
          Noble Paul added a comment - this looks fine. I do not think it needs any change
          Hide
          Dawid Weiss added a comment -

          Shouldn't the patch use StringBuilder instead of StringBuffer (unless you want to keep 1.4 compatibility).

          Show
          Dawid Weiss added a comment - Shouldn't the patch use StringBuilder instead of StringBuffer (unless you want to keep 1.4 compatibility).
          Hide
          Dawid Weiss added a comment -

          A link to the bug which solves padding of hexadecimal entities and processing of uppercase exceptions in HTMLStripReader.

          Show
          Dawid Weiss added a comment - A link to the bug which solves padding of hexadecimal entities and processing of uppercase exceptions in HTMLStripReader.
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks for the patch Ahmed.

          Changes:

          1. Generated patch from correct directory
          2. Use StringBuilder instead of StringBuffer

          It would be nice to have this class handle HTML text coming from java.sql.Clob and java.sql.Blob types too (for an example see FieldReaderDataSource#getData method).

          Show
          Shalin Shekhar Mangar added a comment - Thanks for the patch Ahmed. Changes: Generated patch from correct directory Use StringBuilder instead of StringBuffer It would be nice to have this class handle HTML text coming from java.sql.Clob and java.sql.Blob types too (for an example see FieldReaderDataSource#getData method).
          Hide
          Noble Paul added a comment -

          There is another usecase where the data may come directly from a HttpDataSource/FileDataSource . How can we directly ingest that data?

          Show
          Noble Paul added a comment - There is another usecase where the data may come directly from a HttpDataSource/FileDataSource . How can we directly ingest that data?
          Hide
          Shalin Shekhar Mangar added a comment -

          There is another usecase where the data may come directly from a HttpDataSource/FileDataSource . How can we directly ingest that data?

          Do you mean directly reading from the Reader given by HttpDataSource and FileDataSource and stripping off HTML from it without needing to create an in-memory Map?

          Show
          Shalin Shekhar Mangar added a comment - There is another usecase where the data may come directly from a HttpDataSource/FileDataSource . How can we directly ingest that data? Do you mean directly reading from the Reader given by HttpDataSource and FileDataSource and stripping off HTML from it without needing to create an in-memory Map?
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 723410.

          Thanks Ahmed!

          I didn't want to delay committing this fine contribution We can add more capabilities through another issue if needed.

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 723410. Thanks Ahmed! I didn't want to delay committing this fine contribution We can add more capabilities through another issue if needed.
          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Ahmed Hammad
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development