Uploaded image for project: 'Apache Any23 (Retired)'
  1. Apache Any23 (Retired)
  2. ANY23-415

NTriplesExtractor tries all text/plain files, causing numerous fatal issues

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3
    • 2.3
    • extractors
    • None

    Description

      Since the NTriplesExtractorFactory includes a content type of "text/plain", this causes every plain text file to be processed by the NTriplesExtractor, which in turn causes huge numbers of completely unnecessary fatal issues being sent to the extraction report.

      In my crawls, this mostly occurs for all the "humans.txt" files encountered.

      While this isn't a hugely serious bug, it is quite irritating as it does really clutter up my logs.

       
      Note: the NQuadsExtractorFactory (which can parse all the same documents as NTriples) does not include a content type of "text/plain".

      Attachments

        Issue Links

          Activity

            People

              hansbrende Hans Brende
              hansbrende Hans Brende
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m