Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1291

Invalid JSON output on CLI

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4, 1.5
    • Fix Version/s: 1.6
    • Component/s: cli, metadata
    • Labels:
      None

      Description

      Getting the metadata via CLI from tika with output format set to JSON gives sometimes invalid JSON. I only found float/array errors here in jira and thus created this ticket with a new case.

      In my case the file that lead to invalid JSON output was a PNG file (that I unfortunately can't provide for testing):

      { "Application Record Version":4, 
      "Component 1":"Y component: Quantization table 0, Sampling factors 2 horiz/2 vert", 
      "Component 2":"Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert", 
      "Component 3":"Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert", 
      "Compression Type":"Baseline", 
      "Content-Length":113081, 
      "Content-Type":"image/jpeg", 
      "Data Precision":"8 bits", 
      "IPTC-NAA record":"24 bytes binary data", 
      "Image Height":"479 pixels", 
      "Image Width":"671 pixels", 
      "Number of Components":3, 
      "Resolution Units":"inch", 
      "Unknown tag (0x02f0)":35,0,556,479, 
      "X Resolution":"220 dots", 
      "Y Resolution":"220 dots", 
      "resourceName":18, 
      "tiff:BitsPerSample":8, 
      "tiff:ImageLength":479, 
      "tiff:ImageWidth":671 }
      

      The

      "Unknown tag (0x02f0)":35,0,556,479, 

      is invalid JSON.

      It would be nice if there's always valid json output from tika. For other cases that might not be catched via fixes by this ticket it would be nice to have a CLI argument/option that disables the output of certain (unknown?) fields or allows giving a whitelist of fieldnames to output. That way users can bridge the time until new releases of tika by being more specific on the shell. If that feature already exists I apology for not having found it directly and a hint to the CLI option would be nice.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tallison Tim Allison
                Reporter:
                graste Steffen
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: