ManifoldCF
  1. ManifoldCF
  2. CONNECTORS-598

Add mode to use null content if chromed content not found to the RSS connector

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: ManifoldCF 1.0.1, ManifoldCF 1.1
    • Fix Version/s: ManifoldCF 1.1
    • Component/s: None
    • Labels:
      None

      Description

      I have a public RSS feed on an intranet that lists important bookmarks. The list has many external links in it. So ManifoldCF would need to know when to use the company's proxy to index the external links.

      1. bookmarks.xml
        41 kB
        David Morana
      2. invalidpatch.jpg
        232 kB
        David Morana
      3. npe.txt
        20 kB
        David Morana
      4. solr.patch
        0.8 kB
        Karl Wright
      5. stacktrace.txt
        21 kB
        David Morana
      6. stacktrace1.txt
        25 kB
        David Morana
      7. stacktraceManCF.txt
        65 kB
        David Morana
      8. stacktraceManCF2.txt
        70 kB
        David Morana
      9. stacktraceManCF3.txt
        66 kB
        David Morana
      10. stacktraceManCF4.txt
        60 kB
        David Morana

        Activity

        Hide
        David Morana added a comment -

        Clarification:
        We don't need to crawl external links; We would just like to capture the metadata and have the bookmark indexed.

        Show
        David Morana added a comment - Clarification: We don't need to crawl external links; We would just like to capture the metadata and have the bookmark indexed.
        Hide
        Karl Wright added a comment -

        Filtering the documents being specified in a feed based on whatever criterion is currently not part of the RSS connector. The only means of filtering is by including or excluding the feed itself.

        So it sounds like what you need for this case is NOT to understand a proxy.pac file, but rather to permit discovered URLs to be filtered in some way. Will being able to filter based on regular expressions run against a document URL be sufficient? The web connector uses this strategy, but it seems to me like it would be problematic in an RSS situation. Presumably the mix of links will be changing all the time, as the feeds are regenerated; you might possibly be able to decide via a regexp whether a link was internal or not, but it will be cumbersome to manage this I think.

        The alternative is to generate the feeds without the documents that you don't want.

        Please let me know how you want to proceed.

        Show
        Karl Wright added a comment - Filtering the documents being specified in a feed based on whatever criterion is currently not part of the RSS connector. The only means of filtering is by including or excluding the feed itself. So it sounds like what you need for this case is NOT to understand a proxy.pac file, but rather to permit discovered URLs to be filtered in some way. Will being able to filter based on regular expressions run against a document URL be sufficient? The web connector uses this strategy, but it seems to me like it would be problematic in an RSS situation. Presumably the mix of links will be changing all the time, as the feeds are regenerated; you might possibly be able to decide via a regexp whether a link was internal or not, but it will be cumbersome to manage this I think. The alternative is to generate the feeds without the documents that you don't want. Please let me know how you want to proceed.
        Hide
        David Morana added a comment -

        Well, we would like the bookmarks indexed even if it's an external link. So,
        external links just won't be crawled. Is this possible?
        Here's an example of a bookmark:
        In this case the href is http://www.fleetmon.com/products/services_data which
        is an external link.
        So, can we just send the metadata (title, link, pubdate etc) to be indexed and
        not crawl this link? Basically, can we add a white list and a black list to
        the RSS connector? If it's not too much trouble...
        What are your thoughts?
        <entry>
        <id>
        tag:dogear.ibm.com,2005:link:eab7ff92-e8e8-4770-9a97-f4e4e131f8ee</id>
        <title>FleetMon - Maritime traffic analyses, XML ship position
        data and API access for logistics, research and more -
        FleetMon.com</title>
        <category scheme="http://www.ibm.com/xmlns/prod/sn/type"
        term="bookmark" />
        <link href="http://www.fleetmon.com/products/services_data" />
        <snx:link linkid="eab7ff92-e8e8-4770-9a97-f4e4e131f8ee" />
        <content type="html">
        <![CDATA[<div><p>

        <br />

        </p></div>]]>
        </content>
        <published>2012-06-28T13:36:47-04:00</published>
        <updated>2012-06-28T13:36:47-04:00</updated>
        <category term="gulf_of_mexico" />
        <category term="ship_location_data" />
        <author>
        <email>bmenk@ll.mit.edu</email>
        <snx:userid>927E89B2-3BBA-4832-AFA4-23105CA7CC93</snx:userid>
        <snx:userState>active</snx:userState>
        <name>Menk, Robert (RO17354)</name>
        <uri>
        https://[...]/dogear/html?email=bmenk%40ll.mit.edu</uri>
        </author>
        <snx:clickcount>1</snx:clickcount>
        <snx:linkcount>1</snx:linkcount>
        <link rel="http://www.ibm.com/xmlns/prod/sn/same"
        type="application/atom+xml"
        href="https://[...]/dogear/atom?for=http%3a%2f%2fwww.fleetmon.com%2fproducts%2fservices_data"
        />
        </entry>

        Show
        David Morana added a comment - Well, we would like the bookmarks indexed even if it's an external link. So, external links just won't be crawled. Is this possible? Here's an example of a bookmark: In this case the href is http://www.fleetmon.com/products/services_data which is an external link. So, can we just send the metadata (title, link, pubdate etc) to be indexed and not crawl this link? Basically, can we add a white list and a black list to the RSS connector? If it's not too much trouble... What are your thoughts? <entry> <id> tag:dogear.ibm.com,2005:link:eab7ff92-e8e8-4770-9a97-f4e4e131f8ee</id> <title>FleetMon - Maritime traffic analyses, XML ship position data and API access for logistics, research and more - FleetMon.com</title> <category scheme="http://www.ibm.com/xmlns/prod/sn/type" term="bookmark" /> <link href="http://www.fleetmon.com/products/services_data" /> <snx:link linkid="eab7ff92-e8e8-4770-9a97-f4e4e131f8ee" /> <content type="html"> <![CDATA[<div><p> <br /> </p></div>]]> </content> <published>2012-06-28T13:36:47-04:00</published> <updated>2012-06-28T13:36:47-04:00</updated> <category term="gulf_of_mexico" /> <category term="ship_location_data" /> <author> <email>bmenk@ll.mit.edu</email> <snx:userid>927E89B2-3BBA-4832-AFA4-23105CA7CC93</snx:userid> <snx:userState>active</snx:userState> <name>Menk, Robert (RO17354)</name> <uri> https://[...]/dogear/html?email=bmenk%40ll.mit.edu </uri> </author> <snx:clickcount>1</snx:clickcount> <snx:linkcount>1</snx:linkcount> <link rel="http://www.ibm.com/xmlns/prod/sn/same" type="application/atom+xml" href="https://[...]/dogear/atom?for=http%3a%2f%2fwww.fleetmon.com%2fproducts%2fservices_data" /> </entry>
        Hide
        Karl Wright added a comment -

        There is already a mode where the RSS connector indexes only metadata from the feed, but does not actually fetch the feed itself. This is configured on a per-job basis. You just tell it what RSS feed field to use as "content", e.g. the "description" tag or the "content" tag. Look for the "Dechromed content" tab.

        Does this do what you want?

        Show
        Karl Wright added a comment - There is already a mode where the RSS connector indexes only metadata from the feed, but does not actually fetch the feed itself. This is configured on a per-job basis. You just tell it what RSS feed field to use as "content", e.g. the "description" tag or the "content" tag. Look for the "Dechromed content" tab. Does this do what you want?
        Hide
        David Morana added a comment -

        Hi Karl,
        I tried using the dechromed content and description but the simple history
        reports 403 for the RSS feed links each time.

        Show
        David Morana added a comment - Hi Karl, I tried using the dechromed content and description but the simple history reports 403 for the RSS feed links each time.
        Hide
        Karl Wright added a comment - - edited

        If the entry has no identifiable metadata content (e.g. if you select a "description" for dechromed content but there is no "description" tag for the entry), it attempts the fetch. But it does look like you have a valid "content" value, at least on the entry you included above, so you would have to select "content" as the dechromed content field.

        If you have inconsistent data and just need to fake it out, and really don't care whether there is data or not, it is probably straightforward to add an additional setting to the "dechromed content" radio button to force it to index only metadata with no content.

        If you can't get this to work I will have to play around with it Sunday evening; I've got limited time today.

        Show
        Karl Wright added a comment - - edited If the entry has no identifiable metadata content (e.g. if you select a "description" for dechromed content but there is no "description" tag for the entry), it attempts the fetch. But it does look like you have a valid "content" value, at least on the entry you included above, so you would have to select "content" as the dechromed content field. If you have inconsistent data and just need to fake it out, and really don't care whether there is data or not, it is probably straightforward to add an additional setting to the "dechromed content" radio button to force it to index only metadata with no content. If you can't get this to work I will have to play around with it Sunday evening; I've got limited time today.
        Hide
        David Morana added a comment -

        I'm currently testing the fix you put in for the Communities feed. It's
        working! And I figured out why the pubdate is just a number. It's pulling the
        pubdate from ImageLastMod or FIELD_MEMBERS_UPDATE_DATE for some reason.
        Here's a snippet from the feed:
        ImageLastMod is 1347383610391 and since I'm formatting the date to yyyy-mm-dd;
        I only get the first 8 digits.
        I'm going to try to use the Metadata secition to map pubdate to updated if
        possible.

        <atom:entry>
        <atom:id>504fae6e-c0ff-4c10-a5ae-8eb8344a7d3f</atom:id>
        <atom:title>
        <![CDATA[IEDs and Explosives Detection Current Awareness]]>
        </atom:title>
        <atom:contributor>
        <snx:userid>9AA207A4-D4A2-4C6A-886F-CD00AC48431D</snx:userid>
        <atom:name>Muszynski, Anna (AN22801)</atom:name>
        <atom:email>anna.muszynski@ll.mit.edu</atom:email>
        </atom:contributor>
        <atom:link
        href="/communities/service/html/communityview?communityUuid=504fae6e-c0ff-4c10-a5ae-8eb8344a7d3f"
        rel="alternate" hreflang="en" />
        <atom:updated>2012-09-11T17:13:30.391Z</atom:updated>
        <atom:summary type="html">
        <![CDATA[Coverage includes the latest news and reports on IEDs, suicide
        bombers, and combating these threats; novel IED delivery methods and attack
        strategies; IED detection countermeasures; and scholarly papers on the science
        of explosives detection. This community is updated every two weeks. ]]>
        </atom:summary>
        <opensearch:relevance>1.0</opensearch:relevance>
        <ibmsc:field id="ATOMAPISOURCE">
        http://llwbas5qa.llan.ll.mit.edu/communities/service/atom/community/instance?communityUuid=504fae6e-c0ff-4c10-a5ae-8eb8344a7d3f</ibmsc:field>
        <ibmsc:field id="FIELD_COMMUNITY_MEMBER_COUNT">21</ibmsc:field>
        <ibmsc:field id="AccessControlLevel">public</ibmsc:field>
        <ibmsc:field id="ContentSourceType">Communities</ibmsc:field>
        <ibmsc:field id="groupCount">0</ibmsc:field>
        <ibmsc:field id="ImageLastMod">1347383610391</ibmsc:field>
        <ibmsc:field id="FIELD_MEMBERS_UPDATE_DATE">
        1347383610391</ibmsc:field>
        <ibmsc:field id="tag">explosives</ibmsc:field>
        <ibmsc:field id="tag">explosives_detection</ibmsc:field>
        <ibmsc:field id="tag">ied</ibmsc:field>
        <ibmsc:field id="tag">ied_detection</ibmsc:field>
        <ibmsc:field id="tag">ied_events</ibmsc:field>
        </atom:entry>

        Show
        David Morana added a comment - I'm currently testing the fix you put in for the Communities feed. It's working! And I figured out why the pubdate is just a number. It's pulling the pubdate from ImageLastMod or FIELD_MEMBERS_UPDATE_DATE for some reason. Here's a snippet from the feed: ImageLastMod is 1347383610391 and since I'm formatting the date to yyyy-mm-dd; I only get the first 8 digits. I'm going to try to use the Metadata secition to map pubdate to updated if possible. <atom:entry> <atom:id>504fae6e-c0ff-4c10-a5ae-8eb8344a7d3f</atom:id> <atom:title> <![CDATA [IEDs and Explosives Detection Current Awareness] ]> </atom:title> <atom:contributor> <snx:userid>9AA207A4-D4A2-4C6A-886F-CD00AC48431D</snx:userid> <atom:name>Muszynski, Anna (AN22801)</atom:name> <atom:email>anna.muszynski@ll.mit.edu</atom:email> </atom:contributor> <atom:link href="/communities/service/html/communityview?communityUuid=504fae6e-c0ff-4c10-a5ae-8eb8344a7d3f" rel="alternate" hreflang="en" /> <atom:updated>2012-09-11T17:13:30.391Z</atom:updated> <atom:summary type="html"> <![CDATA[Coverage includes the latest news and reports on IEDs, suicide bombers, and combating these threats; novel IED delivery methods and attack strategies; IED detection countermeasures; and scholarly papers on the science of explosives detection. This community is updated every two weeks. ]]> </atom:summary> <opensearch:relevance>1.0</opensearch:relevance> <ibmsc:field id="ATOMAPISOURCE"> http://llwbas5qa.llan.ll.mit.edu/communities/service/atom/community/instance?communityUuid=504fae6e-c0ff-4c10-a5ae-8eb8344a7d3f </ibmsc:field> <ibmsc:field id="FIELD_COMMUNITY_MEMBER_COUNT">21</ibmsc:field> <ibmsc:field id="AccessControlLevel">public</ibmsc:field> <ibmsc:field id="ContentSourceType">Communities</ibmsc:field> <ibmsc:field id="groupCount">0</ibmsc:field> <ibmsc:field id="ImageLastMod">1347383610391</ibmsc:field> <ibmsc:field id="FIELD_MEMBERS_UPDATE_DATE"> 1347383610391</ibmsc:field> <ibmsc:field id="tag">explosives</ibmsc:field> <ibmsc:field id="tag">explosives_detection</ibmsc:field> <ibmsc:field id="tag">ied</ibmsc:field> <ibmsc:field id="tag">ied_detection</ibmsc:field> <ibmsc:field id="tag">ied_events</ibmsc:field> </atom:entry>
        Hide
        Karl Wright added a comment -

        The RSS connector converts all pubdate formats to milliseconds since Jan 1, 1970 UT. That should make it easier for you. If you need this in a date format, use Tika to convert it.

        Show
        Karl Wright added a comment - The RSS connector converts all pubdate formats to milliseconds since Jan 1, 1970 UT. That should make it easier for you. If you need this in a date format, use Tika to convert it.
        Hide
        David Morana added a comment -

        Hi Karl,
        How exactly do I use Tika to convert milliseconds to a date format? In the
        extraction handler?
        The only formats I see in solr are:
        yyyy-MM-dd'T'HH:mm:ss'Z'
        yyyy-MM-dd'T'HH:mm:ss
        yyyy-MM-dd
        yyyy-MM-dd hh:mm:ss
        yyyy-MM-dd HH:mm:ss
        EEE MMM d hh:mm:ss z yyyy
        EEE, dd MMM yyyy HH:mm:ss zzz
        EEEE, dd-MMM-yy HH:mm:ss zzz
        EEE MMM d HH:mm:ss yyyy

        Show
        David Morana added a comment - Hi Karl, How exactly do I use Tika to convert milliseconds to a date format? In the extraction handler? The only formats I see in solr are: yyyy-MM-dd'T'HH:mm:ss'Z' yyyy-MM-dd'T'HH:mm:ss yyyy-MM-dd yyyy-MM-dd hh:mm:ss yyyy-MM-dd HH:mm:ss EEE MMM d hh:mm:ss z yyyy EEE, dd MMM yyyy HH:mm:ss zzz EEEE, dd-MMM-yy HH:mm:ss zzz EEE MMM d HH:mm:ss yyyy
        Hide
        David Morana added a comment -

        I figured it out.
        I did this in the extraction handler
        <lst name="date.formats">
        <str>SSSSSSSSSSSSS</str>
        <str>yyyy-MM-dd</str>
        </lst>

        Show
        David Morana added a comment - I figured it out. I did this in the extraction handler <lst name="date.formats"> <str>SSSSSSSSSSSSS</str> <str>yyyy-MM-dd</str> </lst>
        Hide
        Karl Wright added a comment -

        r1429250 to add mode as described. Still need Japanese translations for two radio button entries though.

        Show
        Karl Wright added a comment - r1429250 to add mode as described. Still need Japanese translations for two radio button entries though.
        Hide
        Shinichiro Abe added a comment -

        r1429664 for Japanese translations.

        Show
        Shinichiro Abe added a comment - r1429664 for Japanese translations.
        Hide
        Karl Wright added a comment -

        Thank you Abe-san!

        Show
        Karl Wright added a comment - Thank you Abe-san!
        Hide
        David Morana added a comment -

        I downloaded and built the trunk.
        I selected dechromed content and include metadata if dechromed content is unavailable.
        The job ran through the feed and unfortunately nothing was entered into the index. No errors in the log.
        The simple history had result code 403 for every link.

        Show
        David Morana added a comment - I downloaded and built the trunk. I selected dechromed content and include metadata if dechromed content is unavailable. The job ran through the feed and unfortunately nothing was entered into the index. No errors in the log. The simple history had result code 403 for every link.
        Hide
        Karl Wright added a comment -

        This is what I get when I run it here (copied from simple history):

        Start Time 	Activity 	Identifier 	Result Code 	Bytes 	Time 	Result Description
        01-07-2013 18:08:25.542 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/vIyyioAmutU/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.515 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/rjRb9nFOO00/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.510 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/3zwOf42O2zs/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.501 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/_e4HXLLL7a4/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.500 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/iyAtvK1A99Y/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.441 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Js-KOonnz5w/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.441 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/sZJXsVdo9oU/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.440 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/qy0k2xL7Ftc/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.424 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/I_H-SRnXWX0/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.424 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/R_IZEDdBYrk/
        	OK 	0 	1 	
        01-07-2013 18:08:25.415 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/4hRpfFeQD-Y/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.400 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/v247bUpZbaM/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.375 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/hcKNy9qeloo/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.357 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/uIdWVI7WGmQ/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.357 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/DT451tcmFPc/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.290 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/n5pXMRw6PXY/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.290 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/q0INJfZqrZk/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.289 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/qYLxwH6OTq8/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:25.272 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/X3LA17lznNs/
        	OK 	0 	1 	
        01-07-2013 18:08:25.200 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/sPynL-fKm2k/
        	OK 	0 	1 	
        01-07-2013 18:08:25.158 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/JYYIEYXqn34/
        	OK 	0 	1 	
        01-07-2013 18:08:25.157 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/vx0ARutyCFQ/
        	OK 	0 	1 	
        01-07-2013 18:08:25.132 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/2o_HS0Ytvw0/
        	OK 	0 	1 	
        01-07-2013 18:08:25.132 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/E_vIMkTc3SY/
        	OK 	0 	1 	
        01-07-2013 18:08:25.121 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/f2pQv1EmVbY/
        	OK 	0 	1 	
        01-07-2013 18:08:25.097 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/P9wFf4tbbyw/
        	OK 	0 	1 	
        01-07-2013 18:08:25.097 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/RsTcSFqN_Ug/
        	OK 	0 	1 	
        01-07-2013 18:08:25.083 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Vl4YhIIOBw4/
        	OK 	0 	1 	
        01-07-2013 18:08:25.044 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Xwa7iI_z-DI/
        	OK 	0 	1 	
        01-07-2013 18:08:25.012 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/CE0-oIRG7AY/
        	OK 	0 	1 	
        01-07-2013 18:08:24.970 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Z8koXr-c0hM/
        	OK 	0 	1 	
        01-07-2013 18:08:24.969 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/XdISDjJ8ssM/
        	OK 	0 	1 	
        01-07-2013 18:08:24.951 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/vla1PVt8FZY/
        	OK 	0 	1 	
        01-07-2013 18:08:24.951 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Xw_o-RmRHjY/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.938 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/30hpqQsjVnM/
        	OK 	0 	1 	
        01-07-2013 18:08:24.937 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/hyXOp5VGCeI/
        	OK 	0 	1 	
        01-07-2013 18:08:24.921 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/TT8Ec4zVGJk/
        	OK 	0 	1 	
        01-07-2013 18:08:24.818 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/mxmh2VeX1es/
        	OK 	0 	1 	
        01-07-2013 18:08:24.799 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/uBSQZcXo-zc/
        	OK 	0 	1 	
        01-07-2013 18:08:24.495 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/4R9riSU5_dE/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.493 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/EsPvFhru6GA/
        	OK 	0 	1 	
        01-07-2013 18:08:24.470 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/MTZzSjr1koQ/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.378 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/xahBG4AeMec/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.366 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/x1PcJu3Dom8/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.351 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/WyYKAkWGrMA/
        	OK 	0 	1 	
        01-07-2013 18:08:24.325 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Hl_LmMqYayQ/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.321 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Ldp1sdgHg6Y/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.320 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/ZkjlP3nuJ3A/
        	OK 	0 	1 	
        01-07-2013 18:08:24.266 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/m54Mn6bBaFc/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.183 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/3wXsnubXbcU/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.075 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/NRfTQXtUXrs/
        	OK 	0 	1 	
        01-07-2013 18:08:24.066 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/pdSwTeJD3Z8/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.066 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Y_Mt1879Kis/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.065 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Ri4FamtVJn4/
        	OK 	0 	1 	
        01-07-2013 18:08:24.065 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/rjRb9nFOO00/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.062 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/SNYoKE6BhSo/
        	OK 	0 	1 	
        01-07-2013 18:08:24.049 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/0TuIW0k4giM/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:24.049 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/ewytRtmTIx0/
        	OK 	0 	1 	
        01-07-2013 18:08:24.037 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/uFTU092-oBI/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.754 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/ngw7qGK79qQ/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.754 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/JM9wgSdlENs/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.747 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/_mHId9zCq9Q/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.708 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/u_BmD74SAdk/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.691 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/bjY-P_QaYXw/
        	OK 	0 	1 	
        01-07-2013 18:08:23.682 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/A4MaBpFe-RI/
        	OK 	0 	1 	
        01-07-2013 18:08:23.681 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/-3vXT7eOIoc/
        	OK 	0 	1 	
        01-07-2013 18:08:23.678 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Fln1hYNlFQo/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.670 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/SWbIDAt53Vs/
        	OK 	0 	1 	
        01-07-2013 18:08:23.663 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/o2iGoU0qIQs/
        	OK 	0 	1 	
        01-07-2013 18:08:23.656 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/suHOcgj9ZAU/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.538 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Y-xATRf0vtE/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.538 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/cVBVWop-lp0/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.538 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/fCmSyFtZG_M/
        	OK 	0 	1 	
        01-07-2013 18:08:23.538 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/vIyyioAmutU/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.537 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/q_BVNvow_Wg/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.537 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/Em4YW6HcoEc/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.536 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/f-YMt8rkhNU/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.524 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/fKnh0_S66CE/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:23.522 	document ingest (test) 	http://rss.cnn.com/~r/rss/cnn_topstories/~3/F90RdH__g9o/index...
        .html
        	OK 	0 	1 	
        01-07-2013 18:08:20.358 	fetch 	http://rss.cnn.com/rss/cnn_topstories.rss
        	200 	150805 	2496 	
        01-07-2013 18:08:15.580 	robots parse 	rss.cnn.com
        	SUCCESS 	0 	1 	
        01-07-2013 18:08:15.360 	fetch 	http://rss.cnn.com/robots.txt
        	200 	29 	226 	
        01-07-2013 18:08:12.951 	job start 	1357600012941(test)
        		0 	1 	
        

        Only two fetches are made: one for robots.txt, and one for http://rss.cnn.com/rss/cnn_topstories.rss. And yet dozens of documents are indexed.

        Can you please confirm that you checked out the code from svn trunk at https://svn.apache.org/repos/asf/manifoldcf/trunk?

        Show
        Karl Wright added a comment - This is what I get when I run it here (copied from simple history): Start Time Activity Identifier Result Code Bytes Time Result Description 01-07-2013 18:08:25.542 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/vIyyioAmutU/index... .html OK 0 1 01-07-2013 18:08:25.515 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/rjRb9nFOO00/index... .html OK 0 1 01-07-2013 18:08:25.510 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/3zwOf42O2zs/index... .html OK 0 1 01-07-2013 18:08:25.501 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/_e4HXLLL7a4/index... .html OK 0 1 01-07-2013 18:08:25.500 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/iyAtvK1A99Y/index... .html OK 0 1 01-07-2013 18:08:25.441 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Js-KOonnz5w/index... .html OK 0 1 01-07-2013 18:08:25.441 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/sZJXsVdo9oU/index... .html OK 0 1 01-07-2013 18:08:25.440 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/qy0k2xL7Ftc/index... .html OK 0 1 01-07-2013 18:08:25.424 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/I_H-SRnXWX0/index... .html OK 0 1 01-07-2013 18:08:25.424 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/R_IZEDdBYrk/ OK 0 1 01-07-2013 18:08:25.415 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/4hRpfFeQD-Y/index... .html OK 0 1 01-07-2013 18:08:25.400 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/v247bUpZbaM/index... .html OK 0 1 01-07-2013 18:08:25.375 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/hcKNy9qeloo/index... .html OK 0 1 01-07-2013 18:08:25.357 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/uIdWVI7WGmQ/index... .html OK 0 1 01-07-2013 18:08:25.357 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/DT451tcmFPc/index... .html OK 0 1 01-07-2013 18:08:25.290 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/n5pXMRw6PXY/index... .html OK 0 1 01-07-2013 18:08:25.290 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/q0INJfZqrZk/index... .html OK 0 1 01-07-2013 18:08:25.289 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/qYLxwH6OTq8/index... .html OK 0 1 01-07-2013 18:08:25.272 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/X3LA17lznNs/ OK 0 1 01-07-2013 18:08:25.200 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/sPynL-fKm2k/ OK 0 1 01-07-2013 18:08:25.158 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/JYYIEYXqn34/ OK 0 1 01-07-2013 18:08:25.157 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/vx0ARutyCFQ/ OK 0 1 01-07-2013 18:08:25.132 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/2o_HS0Ytvw0/ OK 0 1 01-07-2013 18:08:25.132 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/E_vIMkTc3SY/ OK 0 1 01-07-2013 18:08:25.121 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/f2pQv1EmVbY/ OK 0 1 01-07-2013 18:08:25.097 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/P9wFf4tbbyw/ OK 0 1 01-07-2013 18:08:25.097 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/RsTcSFqN_Ug/ OK 0 1 01-07-2013 18:08:25.083 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Vl4YhIIOBw4/ OK 0 1 01-07-2013 18:08:25.044 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Xwa7iI_z-DI/ OK 0 1 01-07-2013 18:08:25.012 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/CE0-oIRG7AY/ OK 0 1 01-07-2013 18:08:24.970 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Z8koXr-c0hM/ OK 0 1 01-07-2013 18:08:24.969 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/XdISDjJ8ssM/ OK 0 1 01-07-2013 18:08:24.951 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/vla1PVt8FZY/ OK 0 1 01-07-2013 18:08:24.951 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Xw_o-RmRHjY/index... .html OK 0 1 01-07-2013 18:08:24.938 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/30hpqQsjVnM/ OK 0 1 01-07-2013 18:08:24.937 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/hyXOp5VGCeI/ OK 0 1 01-07-2013 18:08:24.921 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/TT8Ec4zVGJk/ OK 0 1 01-07-2013 18:08:24.818 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/mxmh2VeX1es/ OK 0 1 01-07-2013 18:08:24.799 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/uBSQZcXo-zc/ OK 0 1 01-07-2013 18:08:24.495 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/4R9riSU5_dE/index... .html OK 0 1 01-07-2013 18:08:24.493 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/EsPvFhru6GA/ OK 0 1 01-07-2013 18:08:24.470 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/MTZzSjr1koQ/index... .html OK 0 1 01-07-2013 18:08:24.378 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/xahBG4AeMec/index... .html OK 0 1 01-07-2013 18:08:24.366 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/x1PcJu3Dom8/index... .html OK 0 1 01-07-2013 18:08:24.351 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/WyYKAkWGrMA/ OK 0 1 01-07-2013 18:08:24.325 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Hl_LmMqYayQ/index... .html OK 0 1 01-07-2013 18:08:24.321 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Ldp1sdgHg6Y/index... .html OK 0 1 01-07-2013 18:08:24.320 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/ZkjlP3nuJ3A/ OK 0 1 01-07-2013 18:08:24.266 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/m54Mn6bBaFc/index... .html OK 0 1 01-07-2013 18:08:24.183 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/3wXsnubXbcU/index... .html OK 0 1 01-07-2013 18:08:24.075 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/NRfTQXtUXrs/ OK 0 1 01-07-2013 18:08:24.066 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/pdSwTeJD3Z8/index... .html OK 0 1 01-07-2013 18:08:24.066 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Y_Mt1879Kis/index... .html OK 0 1 01-07-2013 18:08:24.065 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Ri4FamtVJn4/ OK 0 1 01-07-2013 18:08:24.065 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/rjRb9nFOO00/index... .html OK 0 1 01-07-2013 18:08:24.062 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/SNYoKE6BhSo/ OK 0 1 01-07-2013 18:08:24.049 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/0TuIW0k4giM/index... .html OK 0 1 01-07-2013 18:08:24.049 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/ewytRtmTIx0/ OK 0 1 01-07-2013 18:08:24.037 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/uFTU092-oBI/index... .html OK 0 1 01-07-2013 18:08:23.754 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/ngw7qGK79qQ/index... .html OK 0 1 01-07-2013 18:08:23.754 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/JM9wgSdlENs/index... .html OK 0 1 01-07-2013 18:08:23.747 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/_mHId9zCq9Q/index... .html OK 0 1 01-07-2013 18:08:23.708 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/u_BmD74SAdk/index... .html OK 0 1 01-07-2013 18:08:23.691 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/bjY-P_QaYXw/ OK 0 1 01-07-2013 18:08:23.682 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/A4MaBpFe-RI/ OK 0 1 01-07-2013 18:08:23.681 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/-3vXT7eOIoc/ OK 0 1 01-07-2013 18:08:23.678 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Fln1hYNlFQo/index... .html OK 0 1 01-07-2013 18:08:23.670 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/SWbIDAt53Vs/ OK 0 1 01-07-2013 18:08:23.663 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/o2iGoU0qIQs/ OK 0 1 01-07-2013 18:08:23.656 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/suHOcgj9ZAU/index... .html OK 0 1 01-07-2013 18:08:23.538 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Y-xATRf0vtE/index... .html OK 0 1 01-07-2013 18:08:23.538 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/cVBVWop-lp0/index... .html OK 0 1 01-07-2013 18:08:23.538 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/fCmSyFtZG_M/ OK 0 1 01-07-2013 18:08:23.538 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/vIyyioAmutU/index... .html OK 0 1 01-07-2013 18:08:23.537 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/q_BVNvow_Wg/index... .html OK 0 1 01-07-2013 18:08:23.537 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/Em4YW6HcoEc/index... .html OK 0 1 01-07-2013 18:08:23.536 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/f-YMt8rkhNU/index... .html OK 0 1 01-07-2013 18:08:23.524 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/fKnh0_S66CE/index... .html OK 0 1 01-07-2013 18:08:23.522 document ingest (test) http: //rss.cnn.com/~r/rss/cnn_topstories/~3/F90RdH__g9o/index... .html OK 0 1 01-07-2013 18:08:20.358 fetch http: //rss.cnn.com/rss/cnn_topstories.rss 200 150805 2496 01-07-2013 18:08:15.580 robots parse rss.cnn.com SUCCESS 0 1 01-07-2013 18:08:15.360 fetch http: //rss.cnn.com/robots.txt 200 29 226 01-07-2013 18:08:12.951 job start 1357600012941(test) 0 1 Only two fetches are made: one for robots.txt, and one for http://rss.cnn.com/rss/cnn_topstories.rss . And yet dozens of documents are indexed. Can you please confirm that you checked out the code from svn trunk at https://svn.apache.org/repos/asf/manifoldcf/trunk?
        Hide
        David Morana added a comment -

        I'm sorry; I don't know what the issue is. I keep getting result code 403 for
        the bookmark feed.
        I created another new project and downloaded the latest from the trunk. I
        built it using the library you supplied. I deleted the stub directory and
        replaced the build file with the custom one (and added the lapi.jar file)
        I ran the bookmarks job 4 times with different settings (content, description
        etc) and I can't even go back to trying to crawl the links like it was in the
        beginning.
        I also specified re-ingesting at the solr output connector many times to no
        avail.
        I'll create new RSS connector just to be sure...
        Ok, the new RSS connector is trying to crawl every link it finds. It aborts
        because it can't reach beyond the firewall. As expected.
        I try the dechromed option (content and metadata)
        The new connector aborts almost immediately.
        Solr reports multiple errors
        java.lang.NullPointerException at
        org.apache.solr.handler.extraction.SolrContentHandler.addLiterals(SolrContentHandler.java:157)
        at
        org.apache.solr.handler.extraction.SolrContentHandler.newDocument(SolrContentHandler.java:115)
        at
        org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:122)
        at
        org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:128)
        at
        org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:218)
        at
        org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
        at
        org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at
        org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
        at
        org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
        at
        org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244)
        at
        org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
        at
        org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
        at
        org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
        at
        org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
        at
        org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541)
        at
        org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:383)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243)
        at
        org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
        at
        org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166)
        at
        org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:288)
        at
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

        Manifold reports: Error: Repeated service interruptions - failure processing
        document: Server at http://localhost:8080/solr/Bookmarks returned non ok
        status:500, message:Internal Server Error

        I figured my index was corrupted (even though it's empty) I deleted it and
        restarted Solr and I get the same null pointer exceptions.

        I'm attaching a copy of the bookmark feed in question.

        Show
        David Morana added a comment - I'm sorry; I don't know what the issue is. I keep getting result code 403 for the bookmark feed. I created another new project and downloaded the latest from the trunk. I built it using the library you supplied. I deleted the stub directory and replaced the build file with the custom one (and added the lapi.jar file) I ran the bookmarks job 4 times with different settings (content, description etc) and I can't even go back to trying to crawl the links like it was in the beginning. I also specified re-ingesting at the solr output connector many times to no avail. I'll create new RSS connector just to be sure... Ok, the new RSS connector is trying to crawl every link it finds. It aborts because it can't reach beyond the firewall. As expected. I try the dechromed option (content and metadata) The new connector aborts almost immediately. Solr reports multiple errors java.lang.NullPointerException at org.apache.solr.handler.extraction.SolrContentHandler.addLiterals(SolrContentHandler.java:157) at org.apache.solr.handler.extraction.SolrContentHandler.newDocument(SolrContentHandler.java:115) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:122) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:128) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:218) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:383) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:288) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Manifold reports: Error: Repeated service interruptions - failure processing document: Server at http://localhost:8080/solr/Bookmarks returned non ok status:500, message:Internal Server Error I figured my index was corrupted (even though it's empty) I deleted it and restarted Solr and I get the same null pointer exceptions. I'm attaching a copy of the bookmark feed in question.
        Hide
        Karl Wright added a comment -

        It may be that the problem is that Solr cannot handle empty content.

        Can you try creating a job that doesn't go to Solr but goes to a null output connection instead? See if you can crawl then.

        Show
        Karl Wright added a comment - It may be that the problem is that Solr cannot handle empty content. Can you try creating a job that doesn't go to Solr but goes to a null output connection instead? See if you can crawl then.
        Hide
        David Morana added a comment -

        I pointed it at a Null Output and it worked! 524 documents found.
        So, how do I get it to work with Solr?

        Show
        David Morana added a comment - I pointed it at a Null Output and it worked! 524 documents found. So, how do I get it to work with Solr?
        Hide
        Karl Wright added a comment -

        I think you are going to need to patch SOLR. I can see if there's already a SOLR ticket, and if not create a patch and attach it, but I can't do that until this evening. It looks pretty trivial though.

        Show
        Karl Wright added a comment - I think you are going to need to patch SOLR. I can see if there's already a SOLR ticket, and if not create a patch and attach it, but I can't do that until this evening. It looks pretty trivial though.
        Hide
        David Morana added a comment -

        Ok, can you describe the problem and fix exactly? And I'll submit it

        Show
        David Morana added a comment - Ok, can you describe the problem and fix exactly? And I'll submit it
        Hide
        Karl Wright added a comment -

        I was just going to find the place where the NPE was being thrown and reasonably handle the null pointer instead. Is this Solr 4.0.0? or some version of 3.x?

        Show
        Karl Wright added a comment - I was just going to find the place where the NPE was being thrown and reasonably handle the null pointer instead. Is this Solr 4.0.0? or some version of 3.x?
        Hide
        David Morana added a comment -

        Solr 4.0 alpha

        Show
        David Morana added a comment - Solr 4.0 alpha
        Hide
        Karl Wright added a comment -

        There have been significant fixes in solr since 4.0 alpha in this area. Can you upgrade?

        Show
        Karl Wright added a comment - There have been significant fixes in solr since 4.0 alpha in this area. Can you upgrade?
        Hide
        Karl Wright added a comment -

        If you are stuck with solr 4.0 alpha, here's a patch which may work.

        Show
        Karl Wright added a comment - If you are stuck with solr 4.0 alpha, here's a patch which may work.
        Hide
        David Morana added a comment -

        What version should I upgrade to?

        Show
        David Morana added a comment - What version should I upgrade to?
        Hide
        Karl Wright added a comment -

        4.0.0 final

        Show
        Karl Wright added a comment - 4.0.0 final
        Hide
        David Morana added a comment -

        Hi Karl,
        I upgraded to Solr 4.0 final and unfortunately, I'm still getting the NPE
        errors and the ManifoldCF job aborts.
        I need to do more testing to make sure everything else is working but these
        are my findings so far.
        Please advise...
        Thanks,
        David

        Show
        David Morana added a comment - Hi Karl, I upgraded to Solr 4.0 final and unfortunately, I'm still getting the NPE errors and the ManifoldCF job aborts. I need to do more testing to make sure everything else is working but these are my findings so far. Please advise... Thanks, David
        Hide
        David Morana added a comment -

        I tried a RSS connector I know was working and I get the same error in the
        simple history:
        document ingest (T7 Solr - Out - Communities)
        https://c3qa.llan.ll.mit.edu/communities/service/html/communi...
        tyview?communityUuid=a1d16fd7-1ef8-466c-b165-d88cc682d287
        FAILED 103074 1 IOException occured when talking to server at:
        http://localhost:8080/solr/Communities

        That's new.
        But I checked the output connector and they both said connection working.

        Offhand I don't know what's happening here; I put the new manifold snapshot in
        and the new Solr instance came up with no errors.
        Do I have to recreate all the connectors for the new Solr instance?

        Show
        David Morana added a comment - I tried a RSS connector I know was working and I get the same error in the simple history: document ingest (T7 Solr - Out - Communities) https://c3qa.llan.ll.mit.edu/communities/service/html/communi ... tyview?communityUuid=a1d16fd7-1ef8-466c-b165-d88cc682d287 FAILED 103074 1 IOException occured when talking to server at: http://localhost:8080/solr/Communities That's new. But I checked the output connector and they both said connection working. Offhand I don't know what's happening here; I put the new manifold snapshot in and the new Solr instance came up with no errors. Do I have to recreate all the connectors for the new Solr instance?
        Hide
        Karl Wright added a comment -

        Is there an exception trace in the log? If so can you paste it in here?

        Show
        Karl Wright added a comment - Is there an exception trace in the log? If so can you paste it in here?
        Hide
        Karl Wright added a comment -

        Also, please provide the stack trace for the NPE from Solr as well.

        Show
        Karl Wright added a comment - Also, please provide the stack trace for the NPE from Solr as well.
        Hide
        David Morana added a comment -

        Hi Karl,
        This aborted pretty fast. I don't know if I caught the error. Nothing was
        jumping out at me in the traces. stacktraceManCF was before the job the rest
        ManCF# are from after it during/after it aborted.
        Stacktrace and stacktrace1 are from solr.
        I'm fairly certain I have the correct PID. I saw them appear in the task
        manager when I ran Manifold and solr.
        I ran jstack.exe -l PID and redirected the output to a file.
        Let me know if I didn't catch the error.
        Thanks,
        David

        Show
        David Morana added a comment - Hi Karl, This aborted pretty fast. I don't know if I caught the error. Nothing was jumping out at me in the traces. stacktraceManCF was before the job the rest ManCF# are from after it during/after it aborted. Stacktrace and stacktrace1 are from solr. I'm fairly certain I have the correct PID. I saw them appear in the task manager when I ran Manifold and solr. I ran jstack.exe -l PID and redirected the output to a file. Let me know if I didn't catch the error. Thanks, David
        Hide
        Karl Wright added a comment -

        These are thread dumps, not stack traces.

        Stack traces would in general be dumped in logs. For solr, look at the solr logs. For ManifoldCf, look at the manifoldcf logs. If there are no stack traces, I would be surprised.

        Show
        Karl Wright added a comment - These are thread dumps, not stack traces. Stack traces would in general be dumped in logs. For solr, look at the solr logs. For ManifoldCf, look at the manifoldcf logs. If there are no stack traces, I would be surprised.
        Hide
        Karl Wright added a comment -

        I just checked in code that picks apart exceptions thrown by Solr, so the Simple History should have a more meaningful detail message when Solr throws up.

        Full stack traces from logs are still pretty important, though. Please let me know if you can find the appropriate ones.

        Show
        Karl Wright added a comment - I just checked in code that picks apart exceptions thrown by Solr, so the Simple History should have a more meaningful detail message when Solr throws up. Full stack traces from logs are still pretty important, though. Please let me know if you can find the appropriate ones.
        Hide
        David Morana added a comment -

        Here's the NPE from the solr log. I only see it for the Bookmarks core; there
        should be one for the communites core because that aborts too.
        Manifold log:
        ERROR 2013-01-10 09:37:14,916 (Worker thread '8') - Exception tossed: Repeated
        service interruptions - failure processing document: Server at
        http://localhost:8080/solr/Bookmarks returned non ok status:500,
        message:Internal Server Error
        org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service
        interruptions - failure processing document: Server at
        http://localhost:8080/solr/Bookmarks returned non ok status:500,
        message:Internal Server Error
        at
        org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
        Caused by: org.apache.solr.common.SolrException: Server at
        http://localhost:8080/solr/Bookmarks returned non ok status:500,
        message:Internal Server Error
        at
        org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
        at
        org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
        at
        org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
        at
        org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:742)

        And for communities:
        RROR 2013-01-10 09:49:40,041 (Worker thread '10') - Exception tossed: Repeated
        service interruptions - failure processing document: null
        org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service
        interruptions - failure processing document: null
        at
        org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
        Caused by: org.apache.http.client.ClientProtocolException
        at
        org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
        at
        org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
        at
        org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
        at
        org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
        at
        org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
        at
        org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
        at
        org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:742)
        Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry
        request with a non-repeatable request entity. The cause lists the reason the
        original request failed.
        at
        org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:690)
        at
        org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:521)
        at
        org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
        ... 6 more
        Caused by: java.net.SocketException: Software caused connection abort: socket
        write error
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(Unknown Source)
        at java.net.SocketOutputStream.write(Unknown Source)
        at
        org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
        at
        org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
        at
        org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
        at
        org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
        at
        org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
        at
        org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
        at
        org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
        at
        org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
        at
        org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
        at
        org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
        at
        org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at
        org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:716)
        ... 8 more

        Show
        David Morana added a comment - Here's the NPE from the solr log. I only see it for the Bookmarks core; there should be one for the communites core because that aborts too. Manifold log: ERROR 2013-01-10 09:37:14,916 (Worker thread '8') - Exception tossed: Repeated service interruptions - failure processing document: Server at http://localhost:8080/solr/Bookmarks returned non ok status:500, message:Internal Server Error org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Server at http://localhost:8080/solr/Bookmarks returned non ok status:500, message:Internal Server Error at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585) Caused by: org.apache.solr.common.SolrException: Server at http://localhost:8080/solr/Bookmarks returned non ok status:500, message:Internal Server Error at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:742) And for communities: RROR 2013-01-10 09:49:40,041 (Worker thread '10') - Exception tossed: Repeated service interruptions - failure processing document: null org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: null at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585) Caused by: org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:742) Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed. at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:690) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:521) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) ... 6 more Caused by: java.net.SocketException: Software caused connection abort: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169) at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110) at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165) at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92) at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:716) ... 8 more
        Hide
        David Morana added a comment -

        Yes,
        The simple history reported this (same as yesterday)
        01-10-2013 09:49:39.993 document ingest (T7 Solr - Out - Communities)
        https://c3qa.llan.ll.mit.edu/communities/service/html/communi...
        tyview?communityUuid=a1d16fd7-1ef8-466c-b165-d88cc682d287
        FAILED 103074 1 IOException occured when talking to server at:
        http://localhost:8080/solr/Communities

        Show
        David Morana added a comment - Yes, The simple history reported this (same as yesterday) 01-10-2013 09:49:39.993 document ingest (T7 Solr - Out - Communities) https://c3qa.llan.ll.mit.edu/communities/service/html/communi ... tyview?communityUuid=a1d16fd7-1ef8-466c-b165-d88cc682d287 FAILED 103074 1 IOException occured when talking to server at: http://localhost:8080/solr/Communities
        Hide
        Karl Wright added a comment -

        Thanks, this is what I was looking for.

        ManifoldCF is doing fine, actually. It's properly interpreting Solr 500 errors as needing retries. The retries take place for a while and then it gives up. So Solr is not working.

        The trace from Solr indicates that it is the same problem as in 4.0 alpha, despite the code reorg that has taken place since then. You WILL need to patch Solr, I'm afraid. The same patch I created before might suffice, but I will make sure and then let you know.

        Show
        Karl Wright added a comment - Thanks, this is what I was looking for. ManifoldCF is doing fine, actually. It's properly interpreting Solr 500 errors as needing retries. The retries take place for a while and then it gives up. So Solr is not working. The trace from Solr indicates that it is the same problem as in 4.0 alpha, despite the code reorg that has taken place since then. You WILL need to patch Solr, I'm afraid. The same patch I created before might suffice, but I will make sure and then let you know.
        Hide
        Karl Wright added a comment -

        Created SOLR-4293 to track this issue.

        Show
        Karl Wright added a comment - Created SOLR-4293 to track this issue.
        Hide
        Karl Wright added a comment -

        I attached an official SOLR patch to the SOLR ticket. You can download it from there. Hopefully it will not blow up anywhere else when this place is fixed.

        Show
        Karl Wright added a comment - I attached an official SOLR patch to the SOLR ticket. You can download it from there. Hopefully it will not blow up anywhere else when this place is fixed.
        Hide
        David Morana added a comment -

        Hi Karl,
        I managed to download the solr trunk.
        I even figured out how to use the proxy through eclipse!
        I tried to apply the patch through eclipse as well and I get an error: URL
        doesn't contain a valid patch.
        See attached
        Is there a manual way to apply the patch? Or can I just manually edit the
        line in question?
        Thanks,
        David

        Show
        David Morana added a comment - Hi Karl, I managed to download the solr trunk. I even figured out how to use the proxy through eclipse! I tried to apply the patch through eclipse as well and I get an error: URL doesn't contain a valid patch. See attached Is there a manual way to apply the patch? Or can I just manually edit the line in question? Thanks, David
        Hide
        David Morana added a comment -

        No worries, I just pasted the patch into the patch wizard and it worked

        Show
        David Morana added a comment - No worries, I just pasted the patch into the patch wizard and it worked

          People

          • Assignee:
            Karl Wright
            Reporter:
            David Morana
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development