Solr
  1. Solr
  2. SOLR-4799

DIH: join="zipper" aka merge join for nested EntityProcessors

    Details

      Description

      DIH is mostly considered as a playground tool, and real usages end up with SolrJ. I want to contribute few improvements target DIH performance.

      This one provides performant approach for joining SQL Entities with miserable memory at contrast to http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

      The idea is:

      • parent table is explicitly ordered by it’s PK in SQL
      • children table is explicitly ordered by parent_id FK in SQL
      • children entity processor joins ordered resultsets by ‘zipper’ algorithm.

      example usage for what's committed:

      <dataConfig>
      	<document>
      		<entity name="parent" processor="SqlEntityProcessor" query="SELECT * FROM PARENT ORDER BY id">		
      			<entity name="child_1" processor="SqlEntityProcessor"
      				where="parent_id=parent.id" query="SELECT * FROM CHILD_1 ORDER BY parent_id" join="zipper" >
      			</entity>			
      		</entity>
      	</document>
      </dataConfig>
      

      mind about:

      1. ordering both sides
      2. specifying join="zipper" at children entity
      3. note that it works with any entity processors
      1. SOLR-4799.patch
        28 kB
        Mikhail Khludnev
      2. SOLR-4799.patch
        28 kB
        Mikhail Khludnev
      3. SOLR-4799.patch
        25 kB
        Mikhail Khludnev
      4. SOLR-4799.patch
        18 kB
        Mikhail Khludnev
      5. SOLR-4799.patch
        17 kB
        Mikhail Khludnev

        Issue Links

          Activity

          Hide
          James Dyer added a comment -

          Mikhail,

          Let me clarify that DIH is not mostly considered a "playground tool". It performs very well and has a rich feature-set. We use this in production to import millions of documents each day, with each document consisting of fields from 50+ data sources. For simpler imports, it is a quick and easy way to get your data into Solr and run imports. Many many installations use this in production and it works well in many cases.

          That said, the codebase has suffered from years of neglect. Over time people have been more willing to add features rather than refactor. A lot of the code needs to be simpified, re-worked, less-important features removed, etc. The tests need further improvement as well.

          Your idea has great merit. I think this would be an awesome feature to have in DIH. I've wished for it before. But I personally tend to shy away from committing big features to DIH because the code is not stable enough in my opinion. I even have features in JIRA that I've developed and use in Production but feel uneasy about committing until more refactoring and test improvement work is done.

          Show
          James Dyer added a comment - Mikhail, Let me clarify that DIH is not mostly considered a "playground tool". It performs very well and has a rich feature-set. We use this in production to import millions of documents each day, with each document consisting of fields from 50+ data sources. For simpler imports, it is a quick and easy way to get your data into Solr and run imports. Many many installations use this in production and it works well in many cases. That said, the codebase has suffered from years of neglect. Over time people have been more willing to add features rather than refactor. A lot of the code needs to be simpified, re-worked, less-important features removed, etc. The tests need further improvement as well. Your idea has great merit. I think this would be an awesome feature to have in DIH. I've wished for it before. But I personally tend to shy away from committing big features to DIH because the code is not stable enough in my opinion. I even have features in JIRA that I've developed and use in Production but feel uneasy about committing until more refactoring and test improvement work is done.
          Hide
          Shalin Shekhar Mangar added a comment -

          James, I think it is time to do some of the tasks that you have outlined and I'm willing to do the work.

          Mikhail, I'm happy to review and commit such a patch. I think it is a very nice improvement.

          Show
          Shalin Shekhar Mangar added a comment - James, I think it is time to do some of the tasks that you have outlined and I'm willing to do the work. Mikhail, I'm happy to review and commit such a patch. I think it is a very nice improvement.
          Hide
          Mikhail Khludnev added a comment -

          Ok. James, I've got your point. Let me collect code and tests for submitting.

          Show
          Mikhail Khludnev added a comment - Ok. James, I've got your point. Let me collect code and tests for submitting.
          Hide
          Mikhail Khludnev added a comment -

          I want do review the functionality, here is the proposed config

          <dataConfig>
          	<document>
          		<entity name="parent" processor="SqlEntityProcessor" query="SELECT * FROM PARENT ORDER BY id">		
          			<entity name="child_1" processor="OrderedChildrenEntityProcessor"
          				where="parent_id=parent.id" query="SELECT * FROM CHILD_1 ORDER BY parent_id" >
          			</entity>			
          		</entity>
          	</document>
          </dataConfig>
          

          Do you like it?

          Parent and child SQLs can have different order that kills zipper. OrderedChildrenEntityProcessor can enforce ASC order for the PK and FK keys (and throw exception in case of violation), but it also might detect order itself that complicates the code a little. What do you expect for the first code contribution?

          Show
          Mikhail Khludnev added a comment - I want do review the functionality, here is the proposed config <dataConfig> <document> <entity name= "parent" processor= "SqlEntityProcessor" query= "SELECT * FROM PARENT ORDER BY id" > <entity name= "child_1" processor= "OrderedChildrenEntityProcessor" where= "parent_id=parent.id" query= "SELECT * FROM CHILD_1 ORDER BY parent_id" > </entity> </entity> </document> </dataConfig> Do you like it? Parent and child SQLs can have different order that kills zipper. OrderedChildrenEntityProcessor can enforce ASC order for the PK and FK keys (and throw exception in case of violation), but it also might detect order itself that complicates the code a little. What do you expect for the first code contribution?
          Hide
          James Dyer added a comment -

          It would be even more awesome if it didn't assume the entities were extending SqlEntityProcessor. I mean, make zipperjoin an option for any entity processor as opposed to its own new variant on SqlE.P.

          Show
          James Dyer added a comment - It would be even more awesome if it didn't assume the entities were extending SqlEntityProcessor. I mean, make zipperjoin an option for any entity processor as opposed to its own new variant on SqlE.P.
          Hide
          Mikhail Khludnev added a comment - - edited

          Attaching the first drop.

          I don't say I share your idea James Dyer about adding zipper ability across all processor, anyway let's check how it would be.

          Implementation itself is not a big deal 'cause it's based on guava, it's enabled by join='zipper' . Note: it doesn't support case of People *-> Country, but only classic People -*> Sports. though oneliner covers that.

          I extracted DIHSupport constructor, which parses attrs into Relation class. I introduced Zipper as EP internal strategy like DIHCacheSupport. It seems all these stuff should be extracted as few proper strategies at future.

          derby test covers only sports, not countries. They can be also covered, but not both. Joining both sides by zipper will make test super puzzling. So, it needs to be addressed later.

          The most thing which I worry about is the test data. From what I see, we have only vanilla data: for every people we have few or single sports. Zipper caveats are orphaned sports and sportless peoples. if there is a bug in zipper it can mess following entities. btw, giving my experience obtained in DIH vs Threads battle, I can say it menaces to caching implementations also. Ideally, I'd like to pause this one, improve derby test for orphaned children and childless parents and continue with zipper afterwards.

          Please let me know what you think!

          Show
          Mikhail Khludnev added a comment - - edited Attaching the first drop. I don't say I share your idea James Dyer about adding zipper ability across all processor, anyway let's check how it would be. Implementation itself is not a big deal 'cause it's based on guava, it's enabled by join='zipper' . Note: it doesn't support case of People *-> Country , but only classic People -*> Sports . though oneliner covers that. I extracted DIHSupport constructor, which parses attrs into Relation class. I introduced Zipper as EP internal strategy like DIHCacheSupport. It seems all these stuff should be extracted as few proper strategies at future. derby test covers only sports, not countries. They can be also covered, but not both. Joining both sides by zipper will make test super puzzling. So, it needs to be addressed later. The most thing which I worry about is the test data. From what I see, we have only vanilla data: for every people we have few or single sports. Zipper caveats are orphaned sports and sportless peoples. if there is a bug in zipper it can mess following entities. btw, giving my experience obtained in DIH vs Threads battle, I can say it menaces to caching implementations also. Ideally, I'd like to pause this one, improve derby test for orphaned children and childless parents and continue with zipper afterwards. Please let me know what you think!
          Hide
          Mikhail Khludnev added a comment -

          James Dyer any chance you have a look?

          Show
          Mikhail Khludnev added a comment - James Dyer any chance you have a look?
          Hide
          James Dyer added a comment -

          Mikhail, This seems like a great feature, but I haven't looked at it. As I said, I do not feel it wise to add features that won't neatly plug-in the current DIH infrastructure until we improve the code. Really, I would love to chop out features (Debug mode, delta updates, streaming from a POST request, etc), and make it work independently from Solr before we build more into it. But I've been busy with other things and haven't had much time.

          By the way, have you any experience with Apache Flume? In your opinion, could it become DIH's successor? A Solr Sink was added earlier in the year that will index disparate data. I haven't looked much at it, but my first impression is that it is a big, complicated tool whereas DIH is smaller and simpler and a the 2 would have different use-cases. Also, not so sure it has any support yet for RDBMS.

          Show
          James Dyer added a comment - Mikhail, This seems like a great feature, but I haven't looked at it. As I said, I do not feel it wise to add features that won't neatly plug-in the current DIH infrastructure until we improve the code. Really, I would love to chop out features (Debug mode, delta updates, streaming from a POST request, etc), and make it work independently from Solr before we build more into it. But I've been busy with other things and haven't had much time. By the way, have you any experience with Apache Flume? In your opinion, could it become DIH's successor? A Solr Sink was added earlier in the year that will index disparate data. I haven't looked much at it, but my first impression is that it is a big, complicated tool whereas DIH is smaller and simpler and a the 2 would have different use-cases. Also, not so sure it has any support yet for RDBMS.
          Hide
          Mikhail Khludnev added a comment -

          James,
          I don't really understand. I wanted to add a tiny plugin into DIH, but

          I mean, make zipperjoin an option for any entity processor as opposed to its own new variant on SqlE.P.

          and after I went this way after heavy doubts

          As I said, I do not feel it wise to add features that won't neatly plug-in the current DIH infrastructure until we improve the code.

          Anyway, I absolutely share your concerns - DIH is a great idea, but it's worth to revamp an engine. I have no experience with Flume, but I consider it as some kind of transport. I want to look at Pentaho Kettle (kind of old school ETL tool), Cloudera Morphlines.

          Show
          Mikhail Khludnev added a comment - James, I don't really understand. I wanted to add a tiny plugin into DIH, but I mean, make zipperjoin an option for any entity processor as opposed to its own new variant on SqlE.P. and after I went this way after heavy doubts As I said, I do not feel it wise to add features that won't neatly plug-in the current DIH infrastructure until we improve the code. Anyway, I absolutely share your concerns - DIH is a great idea, but it's worth to revamp an engine. I have no experience with Flume, but I consider it as some kind of transport. I want to look at Pentaho Kettle (kind of old school ETL tool), Cloudera Morphlines.
          Hide
          Mikhail Khludnev added a comment -
          Show
          Mikhail Khludnev added a comment - look http://pedroalves-bi.blogspot.ru/2011/07/elasticsearch-kettle-and-ctools.html there is elasticsearch sink for Kettle.
          Hide
          Alexandre Rafalovitch added a comment -

          Morphline is now part of Apache Solr distribution. That probably points the direction in which this will go.

          At the same time, in nearly a year, no further improvements on DIH were done as far as I know. So, perhaps this addition should be committed even if it is not ideal.

          Show
          Alexandre Rafalovitch added a comment - Morphline is now part of Apache Solr distribution. That probably points the direction in which this will go. At the same time, in nearly a year, no further improvements on DIH were done as far as I know. So, perhaps this addition should be committed even if it is not ideal.
          Hide
          James Dyer added a comment -

          At the same time, in nearly a year, no further improvements on DIH were done as far as I know. So, perhaps this addition should be committed even if it is not ideal.

          I would say the exact opposite. There are not very many people maintaining DIH code, and those of us that do are lazy about it. Therefore, let's not stuff more big features in and make more code to maintain when there are no maintainers. I have code here in JIRA that I've used in production for years that I've been unwilling to commit just for this very reason.

          I do see Flume as a great DIH replacement, but from the documentation I don't see it having very great RDBMS support? I think a lot of DIH users are using it to import data from an RDBMS into Solr.

          Show
          James Dyer added a comment - At the same time, in nearly a year, no further improvements on DIH were done as far as I know. So, perhaps this addition should be committed even if it is not ideal. I would say the exact opposite. There are not very many people maintaining DIH code, and those of us that do are lazy about it. Therefore, let's not stuff more big features in and make more code to maintain when there are no maintainers. I have code here in JIRA that I've used in production for years that I've been unwilling to commit just for this very reason. I do see Flume as a great DIH replacement, but from the documentation I don't see it having very great RDBMS support? I think a lot of DIH users are using it to import data from an RDBMS into Solr.
          Hide
          Mikhail Khludnev added a comment - - edited

          Despite of many things described above, I agree with James Dyer. Until we see real demand for this feature, there is no need to pitch it. This "plugin" is really easy to check as a separate drop-in, but you see how many users tried to check it ... no one.

          I understand what the morphlines is, after all. It's a pretty cool lightweight transformation pipeline. but:

          • it doesn't have jdbc input so far (i don't think it's hard to implement it)
          • a pipeline implies a single chain of transformation, I don't see how to naturally join two streams of records.

          Regarding Flume I'm still concerned about its' minimum footprint.

          fwiw here is the Kettle's approach to do the subj http://wiki.pentaho.com/display/EAI/Merge+Join

          Show
          Mikhail Khludnev added a comment - - edited Despite of many things described above, I agree with James Dyer . Until we see real demand for this feature, there is no need to pitch it. This "plugin" is really easy to check as a separate drop-in, but you see how many users tried to check it ... no one . I understand what the morphlines is, after all. It's a pretty cool lightweight transformation pipeline. but: it doesn't have jdbc input so far (i don't think it's hard to implement it) a pipeline implies a single chain of transformation, I don't see how to naturally join two streams of records. Regarding Flume I'm still concerned about its' minimum footprint. fwiw here is the Kettle's approach to do the subj http://wiki.pentaho.com/display/EAI/Merge+Join
          Hide
          Shawn Heisey added a comment -

          I think a lot of DIH users are using it to import data from an RDBMS into Solr.

          This is exactly what I use it for. Based on mailing list and IRC traffic, I think that most people who use DIH are using it for database import. DIH works, and it's a lot more efficient than any single-threaded program that I could write. I don't believe that it is a "playground tool."

          Although DIH used to handle all our indexing, we currently only use it for full index rebuilds. A SolrJ app handles the once-a-minute maintenance. I have plans to build an internal multi-threaded SolrJ tool to handle full rebuilds, but that effort still has not made it through the design phase. Because DIH works so well, we don't have a strong need to replace it.

          Show
          Shawn Heisey added a comment - I think a lot of DIH users are using it to import data from an RDBMS into Solr. This is exactly what I use it for. Based on mailing list and IRC traffic, I think that most people who use DIH are using it for database import. DIH works, and it's a lot more efficient than any single-threaded program that I could write. I don't believe that it is a "playground tool." Although DIH used to handle all our indexing, we currently only use it for full index rebuilds. A SolrJ app handles the once-a-minute maintenance. I have plans to build an internal multi-threaded SolrJ tool to handle full rebuilds, but that effort still has not made it through the design phase. Because DIH works so well, we don't have a strong need to replace it.
          Hide
          Alexandre Rafalovitch added a comment -

          Re: "nobody tried this plugin" - finding anything on JIRA is next to impossible. I would not really take that as indication of need or lack of such.

          Re: pipeline, morphline has nested pipeline including things like unzip a file, find every entry inside, run tika over it and then pass it through XML extraction. Seems a fairly close match to DIH?

          Regarding JDBC support, etc. How about plugging Morphline as a DIH entity processor that works with any DataSource (it's only 3 different usage patterns)? I would be happy to have a go at it, but only if it has a chance of actually incorporating it into DIH.

          Show
          Alexandre Rafalovitch added a comment - Re: "nobody tried this plugin" - finding anything on JIRA is next to impossible. I would not really take that as indication of need or lack of such. Re: pipeline, morphline has nested pipeline including things like unzip a file, find every entry inside, run tika over it and then pass it through XML extraction. Seems a fairly close match to DIH? Regarding JDBC support, etc. How about plugging Morphline as a DIH entity processor that works with any DataSource (it's only 3 different usage patterns)? I would be happy to have a go at it, but only if it has a chance of actually incorporating it into DIH.
          Hide
          Mikhail Khludnev added a comment -

          There are a plenty of sibling point discussed here, let me keep one more. I checked one thing with Kettle ETL (Pentaho). the main problem with Kettle is Eclipse based IDE UI. Giving the DIH replatforming, we expect some Web UI for DSL editing. I found sibling project CDA, which is looking pretty much like this. Here is the summary:

          • the project itself seems modular enough (CBF), hence we can slice some pieces for using in DIH2.0
          • CDA is just a data access - whatever to JSON via HTTP GET
          • thus, it lacks of final indexing steps (via POST or xxxSolrServer);
          • also, it lacks of long lasting command framework (it's a trivial thread with interruption and status flags; not a much deal, but nothing for free there)
          • it shows pretty cute usage of ETL primitives (and I still think that Kettle guts are much powerful than Morflines'): it uses xml DSL to configure Kettle steps and run data export as ETL process.
          Show
          Mikhail Khludnev added a comment - There are a plenty of sibling point discussed here, let me keep one more. I checked one thing with Kettle ETL (Pentaho). the main problem with Kettle is Eclipse based IDE UI. Giving the DIH replatforming, we expect some Web UI for DSL editing. I found sibling project CDA , which is looking pretty much like this. Here is the summary: the project itself seems modular enough (CBF), hence we can slice some pieces for using in DIH2.0 CDA is just a data access - whatever to JSON via HTTP GET thus, it lacks of final indexing steps (via POST or xxxSolrServer); also, it lacks of long lasting command framework (it's a trivial thread with interruption and status flags; not a much deal, but nothing for free there) it shows pretty cute usage of ETL primitives (and I still think that Kettle guts are much powerful than Morflines'): it uses xml DSL to configure Kettle steps and run data export as ETL process.
          Hide
          Mikhail Khludnev added a comment -

          just a small off-topic update.
          I found old Kettle plugin for Solr export https://code.google.com/p/kettle-solr-plugin/
          refreshed it for Kettle 5.0 and put first results to https://github.com/m-khl/kettle-solr-plugin

          It's just a proof, it has some bugs, and lacks of the desired functionality eg streaming/cloud. I'm just dropping off for a while, and wish you get my status if somebody interested it "true" ETL.

          Show
          Mikhail Khludnev added a comment - just a small off-topic update. I found old Kettle plugin for Solr export https://code.google.com/p/kettle-solr-plugin/ refreshed it for Kettle 5.0 and put first results to https://github.com/m-khl/kettle-solr-plugin It's just a proof, it has some bugs, and lacks of the desired functionality eg streaming/cloud. I'm just dropping off for a while, and wish you get my status if somebody interested it "true" ETL.
          Hide
          Mikhail Khludnev added a comment -

          Note regarding the patch. It introduces join="zipper" attribute for any entity processor, which is used as child, for sure both entities should be ordered by the same ID:

          <dataConfig>
          	<document>
          		<entity name="parent" processor="SqlEntityProcessor" query="SELECT * FROM PARENT ORDER BY id">		
          			<entity name="child_1" processor="SqlEntityProcessor"
          				where="parent_id=parent.id" query="SELECT * FROM CHILD_1 ORDER BY parent_id" join="zipper" >
          			</entity>			
          		</entity>
          	</document>
          </dataConfig>
          

          Let me know if you wish to see it like separate class OrderedChildrenEntityProcessor see above. cc Noble Paul

          Show
          Mikhail Khludnev added a comment - Note regarding the patch. It introduces join="zipper" attribute for any entity processor, which is used as child, for sure both entities should be ordered by the same ID: <dataConfig> <document> <entity name= "parent" processor= "SqlEntityProcessor" query= "SELECT * FROM PARENT ORDER BY id" > <entity name= "child_1" processor= "SqlEntityProcessor" where= "parent_id=parent.id" query= "SELECT * FROM CHILD_1 ORDER BY parent_id" join= "zipper" > </entity> </entity> </document> </dataConfig> Let me know if you wish to see it like separate class OrderedChildrenEntityProcessor see above. cc Noble Paul
          Hide
          Noble Paul added a comment -

          does this have any external dependencies ?

          Show
          Noble Paul added a comment - does this have any external dependencies ?
          Hide
          Mikhail Khludnev added a comment -

          nope. for sure.

          Show
          Mikhail Khludnev added a comment - nope. for sure.
          Hide
          Noble Paul added a comment -

          hi , can yoou post a patch updated to the trunk please. If I'm not wrong the code kicks in when the entity attribute "join" is present . So it is a low risk feature anyway

          Show
          Noble Paul added a comment - hi , can yoou post a patch updated to the trunk please. If I'm not wrong the code kicks in when the entity attribute "join" is present . So it is a low risk feature anyway
          Hide
          Mikhail Khludnev added a comment -

          hi , can yoou post a patch updated to the trunk please.

          I'm trying. I'm stuck on commons-codec;1.10 dependency so far. It takes a few days.

          the code kicks in when the entity attribute "join" is present . So it is a low risk feature anyway

          absolutely.

          Show
          Mikhail Khludnev added a comment - hi , can yoou post a patch updated to the trunk please. I'm trying. I'm stuck on commons-codec;1.10 dependency so far. It takes a few days. the code kicks in when the entity attribute "join" is present . So it is a low risk feature anyway absolutely.
          Hide
          Mikhail Khludnev added a comment -

          updated the patch. checked the tests.
          I had to add super.firstInit() into TikaEntityProcessor.firstInit(). it becomes mandatory.

          WDYT?

          Show
          Mikhail Khludnev added a comment - updated the patch. checked the tests. I had to add super.firstInit() into TikaEntityProcessor.firstInit(). it becomes mandatory. WDYT?
          Hide
          Mikhail Khludnev added a comment -

          improved test coverage in TestSqlEntityProcessor* :

          • added countries into zipper join
          • added more entropy into data (orphans, and childfree)
          • improved few asserts, made them randomized and mandatory

          James Dyer improved tests are worth to consider, beside of this jira

          Show
          Mikhail Khludnev added a comment - improved test coverage in TestSqlEntityProcessor* : added countries into zipper join added more entropy into data (orphans, and childfree) improved few asserts, made them randomized and mandatory James Dyer improved tests are worth to consider, beside of this jira
          Hide
          Mikhail Khludnev added a comment -

          improved patch again. now Zipper strictly checks that both sides (primary and foreign keys) are in ascending order. Negative tests are included.

          Noble Paul it's %100 ready.

          Show
          Mikhail Khludnev added a comment - improved patch again. now Zipper strictly checks that both sides (primary and foreign keys) are in ascending order. Negative tests are included. Noble Paul it's %100 ready.
          Hide
          Noble Paul added a comment -

          I can't really understand why an optional component object should be explicitly be initialized all the time. That object should not even be created at all

          EntityProcessorBase.java
            protected void firstInit(Context context) {
              entityName = context.getEntityAttribute("name");
              String s = context.getEntityAttribute(ON_ERROR);
              if (s != null) onError = s;
              
              zipper = new Zipper(context);
              
              if(!zipper.isActive()){
                initCache(context);
              }
              isFirstInit = false;
            }
          

          I would say , please construct the rowIterator using Zipper instead of making it a part of a core class such as EntityProcessorBase

          Show
          Noble Paul added a comment - I can't really understand why an optional component object should be explicitly be initialized all the time. That object should not even be created at all EntityProcessorBase.java protected void firstInit(Context context) { entityName = context.getEntityAttribute( "name" ); String s = context.getEntityAttribute(ON_ERROR); if (s != null ) onError = s; zipper = new Zipper(context); if (!zipper.isActive()){ initCache(context); } isFirstInit = false ; } I would say , please construct the rowIterator using Zipper instead of making it a part of a core class such as EntityProcessorBase
          Hide
          Mikhail Khludnev added a comment -

          yep. agree. Zipper is made optional and baked by factory method.

          I either didn't get your second point or I'd like to address this refactoring separately. It seems like DIHCacheSupport, Zipper, and straightforward case with N queries (one per every parent row) should be covered under separate abstraction of rowIterator, which will be reset per every parent row. nevertheless it's more than moderate long story. Isn't it?

          Show
          Mikhail Khludnev added a comment - yep. agree. Zipper is made optional and baked by factory method. I either didn't get your second point or I'd like to address this refactoring separately. It seems like DIHCacheSupport, Zipper, and straightforward case with N queries (one per every parent row) should be covered under separate abstraction of rowIterator, which will be reset per every parent row. nevertheless it's more than moderate long story. Isn't it?
          Hide
          Noble Paul added a comment -

          second one is optional , but desirable. The first one should be a prerequisite

          Show
          Noble Paul added a comment - second one is optional , but desirable. The first one should be a prerequisite
          Hide
          Mikhail Khludnev added a comment -

          Noble Paul did you catch the recent patch?

          Show
          Mikhail Khludnev added a comment - Noble Paul did you catch the recent patch?
          Hide
          Noble Paul added a comment -

          Isn't only relevant for SqlEntityProcessor ? can zipper be initialized at SqlEntityProcessor instead of EntityProcessorBase ?

          Show
          Noble Paul added a comment - Isn't only relevant for SqlEntityProcessor ? can zipper be initialized at SqlEntityProcessor instead of EntityProcessorBase ?
          Hide
          James Dyer added a comment -

          Let me mention that with SOLR-2943, you can read back cached data in key order, which would allow you to zipper-join anything that can be previously cached. While this is not a committed feature, it demonstrates that you can have entities other than SQL with the keys in the correct order for joining. So if possible, I wouldn't make this just for SQL.

          Show
          James Dyer added a comment - Let me mention that with SOLR-2943 , you can read back cached data in key order, which would allow you to zipper-join anything that can be previously cached. While this is not a committed feature, it demonstrates that you can have entities other than SQL with the keys in the correct order for joining. So if possible, I wouldn't make this just for SQL.
          Hide
          Noble Paul added a comment -

          If that is the case please change the title/description to match that

          Show
          Noble Paul added a comment - If that is the case please change the title/description to match that
          Hide
          Mikhail Khludnev added a comment -

          Noble Paul do like this summary?

          Show
          Mikhail Khludnev added a comment - Noble Paul do like this summary?
          Hide
          Noble Paul added a comment -

          comitted r1643097

          Strangely, svn commits are not posting messages to JIRA

          Show
          Noble Paul added a comment - comitted r1643097 Strangely, svn commits are not posting messages to JIRA
          Hide
          ASF subversion and git services added a comment -

          Commit 1643351 from Adrien Grand in branch 'dev/trunk'
          [ https://svn.apache.org/r1643351 ]

          SOLR-4799: Fix javadocs generation.

          Show
          ASF subversion and git services added a comment - Commit 1643351 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1643351 ] SOLR-4799 : Fix javadocs generation.
          Hide
          ASF subversion and git services added a comment -

          Commit 1643352 from Adrien Grand in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1643352 ]

          SOLR-4799: Fix javadocs generation.

          Show
          ASF subversion and git services added a comment - Commit 1643352 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1643352 ] SOLR-4799 : Fix javadocs generation.
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.
          Show
          Mikhail Khludnev added a comment - linking a blog post http://blog.griddynamics.com/2015/07/how-to-import-structured-data-into-solr.html
          Show
          Mikhail Khludnev added a comment - Cassandra Targett fwiw, I updated ref guide https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=32604271&selectedPageVersions=44&selectedPageVersions=45

            People

            • Assignee:
              Noble Paul
              Reporter:
              Mikhail Khludnev
            • Votes:
              3 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development