Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-987

Add a new DataImportHandler EntityProcessor to handle non-XML files

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None

    Description

      Need a way to use Data Import Handler to index non-XML (i.e. simple text) files (either via HTTP or FileSystem)? This would assist in putting the entire contents of a text file into a single field of a document for which the other fields are being pulled out of another DataSource. An EntityProcessor looks like the right place for this as it may help us add more attributes if needed. We could also consider support for other file formats (PDF, office, etc.), which may overlap with some of the Extraction/Tika work.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              natad Nathan Adams
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: