Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: parser
    • Labels:
      None

      Description

      The PDF parser currently extracts and outputs document content as a single string. PDFBox could be used to support structuring at least down to page and paragraph (not sure how accurate) level.

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jukka Zitting
          • Votes:
            5 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development