Tika
  1. Tika
  2. TIKA-607

ParseUtils.getStringContent( ) of a text file - parser is null

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9
    • Fix Version/s: 0.10
    • Component/s: parser
    • Labels:
      None
    • Environment:

      java version "1.6.0_16", linux 64bit

      Description

      Hey, I'm trying to get content of a text file (mysql config file).

      	public void testTikaParserUtils() throws Exception {
      		String resourceLocation = "files/my.cnf";
      		String content = ParseUtils.getStringContent(new File(resourceLocation), new TikaConfig());
      		System.out.println(content);
      	}
      

      OR

      	public void testTikaParserUtils() throws Exception {
      		String resourceLocation = "files/my.cnf";
      		String content = ParseUtils.getStringContent(new File(resourceLocation), TikaConfig.getDefaultConfig());
      		System.out.println(content);
      	}
      

      but I get null pointer exception, because "parser" is null

      ParseUtils.java
      public static String getStringContent(
                  InputStream stream, TikaConfig config, String mimeType)
                  throws TikaException, IOException {
              try {
                  Parser parser = config.getParser(MediaType.parse(mimeType));
                  ContentHandler handler = new BodyContentHandler();
                  parser.parse(stream, handler, new Metadata());
                  return handler.toString();
              } catch (SAXException e) {
                  throw new TikaException("Unexpected SAX error", e);
              }
          }}
      


      java.lang.NullPointerException
      at org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:112)
      at org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:171)
      at org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:189)
      at cz.instance.transl.tests.TikaTest.testTikaParserUtils(TikaTest.java:53)
      at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:73)
      at org.apache.maven.surefire.testng.TestNGXmlTestSuite.execute(TestNGXmlTestSuite.java:95)
      at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:101)
      at org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:101)
      at $Proxy0.invoke(Unknown Source)
      at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:139)
      at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:82)
      at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:81)
      ... Removed 24 stack frames

      It works only if I specifically determine the type of parser

      	@Test
      	public void testTikaParserUtils() throws Exception {
      		Tika tika = new Tika(new TextDetector());
      		String content = tika.parseToString(new File(txt));
      		System.out.println(content);
      	}
      

        Activity

        Hide
        Jukka Zitting added a comment -

        The Tika facade class was introduced already in Tika 0.5, so you can start using it already without waiting for a new release.

        Show
        Jukka Zitting added a comment - The Tika facade class was introduced already in Tika 0.5, so you can start using it already without waiting for a new release.
        Hide
        Joseph Vychtrle added a comment -

        Thank tou Jukka, I'd be already working with the current revision, but I don't because I'd like to have sources available when I use tika as a maven dependency. And snapshots can't have sources, only releases can be deployed with sources attached. And I was lazy to do it manually. Do you know when the release will be out ?

        Show
        Joseph Vychtrle added a comment - Thank tou Jukka, I'd be already working with the current revision, but I don't because I'd like to have sources available when I use tika as a maven dependency. And snapshots can't have sources, only releases can be deployed with sources attached. And I was lazy to do it manually. Do you know when the release will be out ?
        Hide
        Jukka Zitting added a comment -

        I would suggest that you look at the org.apache.tika.Tika facade class instead of the old ParseUtils class. The Tika facade is a much cleaner and feature-rich collection of utility convenience methods.

        In revision 1079871 I deprecated the ParseUtils class in favor of the Tika facade, and fixed the reported problem by making ParseUtils use the Tika facade under the hood.

        Show
        Jukka Zitting added a comment - I would suggest that you look at the org.apache.tika.Tika facade class instead of the old ParseUtils class. The Tika facade is a much cleaner and feature-rich collection of utility convenience methods. In revision 1079871 I deprecated the ParseUtils class in favor of the Tika facade, and fixed the reported problem by making ParseUtils use the Tika facade under the hood.

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Joseph Vychtrle
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development