Solr
  1. Solr
  2. SOLR-6194

Allow access to DataImporter and DIHConfiguration

    Details

      Description

      I'd like to change the visibility and access to a couple of the internal classes of DataImportHandler, specifically DataImporter and DIHConfiguration. My reasoning is that I've added the ability for a new data import handler "command" called getquery that will return the exact queries (fully resolved) that are executed for an entity within the data import configuration. This makes it much easier to debug the dih, rather than turning on debug/verbose flags and digging through the raw response. Additionally, it gives me a "service" that I can then go take the queries from and run them.

      Here's a snippet of Java code that I can now execute now that I have access to the DIHConfiguration:

      Snippet.java
        /**
         * @return a map of all the queries for each entity in the given config
         */
        protected Map<String,String> getEntityQueries(DIHConfiguration config, Map<String,Object> params)
        {
          Map<String,String> queries = new LinkedHashMap<>();
          if (config != null && config.getEntities() != null)
          {
            //make a new variable resolve
            VariableResolver vr = new VariableResolver();
            vr.addNamespace("dataimporter.request",params);
      
            //for each entity
            for (Entity e : config.getEntities())
            {
              //get the query and resolve it
              if (e.getAllAttributes().containsKey(SqlEntityProcessor.QUERY))
              {
                String query = e.getAllAttributes().get(SqlEntityProcessor.QUERY);
                query = query.replaceAll("\\s+", " ").trim();
                String resolved = vr.replaceTokens(query);
                resolved = resolved.replaceAll("\\s+", " ").trim();
                queries.put(e.getName(),resolved);
                queries.put(e.getName()+"_raw",query);
              }
            }
          }
          return queries;
        }
      

      I'm attaching a patch that I would appreciate someone have a look for consideration. It's fully tested – please let me know if there is something else I need to do/test.

      1. SOLR-6194.patch
        3 kB
        Aaron LaBella
      2. SOLR-6194.patch
        3 kB
        Aaron LaBella

        Activity

        Hide
        ASF subversion and git services added a comment -

        Commit 1605972 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1605972 ]

        SOLR-6194: Allow access to DataImporter and DIHConfiguration from DataImportHandler

        Show
        ASF subversion and git services added a comment - Commit 1605972 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1605972 ] SOLR-6194 : Allow access to DataImporter and DIHConfiguration from DataImportHandler
        Hide
        ASF subversion and git services added a comment -

        Commit 1605973 from shalin@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1605973 ]

        SOLR-6194: Allow access to DataImporter and DIHConfiguration from DataImportHandler

        Show
        ASF subversion and git services added a comment - Commit 1605973 from shalin@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1605973 ] SOLR-6194 : Allow access to DataImporter and DIHConfiguration from DataImportHandler
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks Aaron.

        Show
        Shalin Shekhar Mangar added a comment - Thanks Aaron.
        Hide
        Aaron LaBella added a comment -

        Thanks shalin! I'm attaching one more small patch that opens up a couple other methods. Now with these fixes I have the ability to actually run the entity queries from my data import handler for easier debug/inspection, without actually having to run an import. Can you please review and commit this small patch as well?

        Aaron

        Show
        Aaron LaBella added a comment - Thanks shalin! I'm attaching one more small patch that opens up a couple other methods. Now with these fixes I have the ability to actually run the entity queries from my data import handler for easier debug/inspection, without actually having to run an import. Can you please review and commit this small patch as well? Aaron
        Hide
        Aaron LaBella added a comment -

        another small patch

        Show
        Aaron LaBella added a comment - another small patch
        Hide
        Shalin Shekhar Mangar added a comment - - edited

        Aaron, the DataImporter.getDocBuilder(DIHWriter writer, RequestInfo requestParams) method that you have added isn't used anywhere. I guess you must be using it for your use-case but it is quite possible that somewhere down the line, someone might remove it without realizing it.

        I think this is a good time for you to write a test case showing how you are using DIH without an actual import. I can commit it as part of our test suite.

        Show
        Shalin Shekhar Mangar added a comment - - edited Aaron, the DataImporter.getDocBuilder(DIHWriter writer, RequestInfo requestParams) method that you have added isn't used anywhere. I guess you must be using it for your use-case but it is quite possible that somewhere down the line, someone might remove it without realizing it. I think this is a good time for you to write a test case showing how you are using DIH without an actual import. I can commit it as part of our test suite.
        Hide
        Aaron LaBella added a comment -

        Shalin,

        Good feedback. Here's the static utility class I wrote:

        Snippet.java
        
        //...
        
        final public class SolrQueryUtils
        {
          static public void runEntityQueries(
            DIHConfiguration config, SolrCore core, DataImporter importer,
            Entity entity, String entityName,
            SolrQueryRequest req, SolrQueryResponse rsp,
            Map<String,Object> params) throws Exception
          {
            Collection<Map<String,Object>> rows = new LinkedList<>();
            EntityProcessorWrapper epw = null;
            try
            {
              if (entityName == null || entity.getName().equals(entityName))
              {
                epw = SolrQueryUtils.getEntityProcessorWrapperFrom(
                  core, importer, entity, req, rsp, params
                );
                if (epw != null)
                {
                  Map<String, Object> row = epw.nextRow();
                  //just for sanity, TODO: add a better way to do this
                  final int MAX = 10000;
                  int index = 0;
                  while (row != null)
                  {
                    if (index++ < MAX) { rows.add(row); }
                    row = epw.nextRow();
                  }
                }
              }
            }
            catch (Exception ex)
            {
              ex.printStackTrace();
            }
            finally
            {
              if (epw != null)
              {
                epw.getDatasource().close();
                epw.close();
                epw.destroy();
              }
            }
            if (entityName == null || entity.getName().equals(entityName))
            {
              rsp.add(entity.getName()+"_results",rows);
            }
            //potentially recurse if there are children
            Collection<Entity> children = entity.getChildren();
            if (children != null)
            {
              //for each child entity
              for (Entity child : children)
              {
                SolrQueryUtils.runEntityQueries(config,core,importer,child,entityName,req,rsp,params);
              }
            }
          }
        
          static public EntityProcessorWrapper getEntityProcessorWrapperFrom(
            SolrCore core, DataImporter importer, Entity entity,
            SolrQueryRequest req, SolrQueryResponse rsp,
            Map<String,Object> params) throws Exception
          {
            RequestInfo reqinfo = new RequestInfo(req,params,null);
            DocBuilder docBuilder = importer.getDocBuilder(null,reqinfo);
            EntityProcessorWrapper epw = docBuilder.getEntityProcessorWrapper(entity);
            VariableResolver vr = new VariableResolver();
            vr.addNamespace("dataimporter.request",params);
            Context ctx = new ContextImpl(
              epw,                      //EntityProcessorWrapper epw
              vr,                       //VariableResolver resolver
              null,                     //DataSource ds
              Context.FULL_DUMP,        //String currProcess
              reqinfo.getRawParams(),   //Map<String, Object> global
              null,                     //ContextImpl parentContext
              docBuilder                //DocBuilder docBuilder
            );
            DataSource<?> ds = importer.getDataSourceInstance(
              entity, entity.getDataSourceName(), ctx
            );
            SolrQueryUtils.initDataSource(importer.getConfig(),core,entity,ds,ctx);
            epw.setDatasource(ds);
            epw.init(ctx);
            epw.setInitalized(true);
            return epw;
          }
        
          static public void initDataSource(
            DIHConfiguration config, SolrCore core, Entity entity, DataSource<?> ds, Context ctx
          )
          {
            //add all the properties from the core descriptor
            CoreDescriptor descriptor = core.getCoreDescriptor();
            Map<String,Object> globals = new LinkedHashMap<>();
            globals = SolrQueryUtils.addProperties(descriptor.getPersistableStandardProperties(), globals);
            globals = SolrQueryUtils.addProperties(descriptor.getPersistableUserProperties(), globals);
            //sort the keys for easier debugging
            globals = new TreeMap<>(globals);
        
            //make a new variable for the datasource properties resolver
            VariableResolver globalResolver = new VariableResolver(globals);
            Map<String,String> dsProps = config.getDataSources().get(entity.getDataSourceName());
            Map<String,String> dsPropsResolved = new LinkedHashMap<>();
            for(Map.Entry<String,String> entry : dsProps.entrySet())
            {
              dsPropsResolved.put(entry.getKey(),globalResolver.replaceTokens(entry.getValue()));
            }
        
            Properties dsProperties = new Properties();
            dsProperties.putAll(dsPropsResolved);
            ds.init(ctx,dsProperties);
          }
        
          static public Map<String,Object> addProperties(Properties p, Map<String,Object> map)
          {
            if (p != null && map != null)
            {
              Enumeration<?> enumer = p.propertyNames();
              while(enumer.hasMoreElements())
              {
                String key = enumer.nextElement().toString();
                String value = p.getProperty(key);
                map.put(key,value);
              }
            }
            return map;
          }
        }
        
        

        And, I have a custom data import handler that extends DataImportHandler. I added support for calling the data import handler with command='runquery', which ends up looking up the entities from the config, and calling the runEntityQueries method above.

        I'm not sure I'm gonna be able to write a test case since I'd assume that something would have to be mocked up in order for it to actually run?

        Let me know...

        Aaron

        Show
        Aaron LaBella added a comment - Shalin, Good feedback. Here's the static utility class I wrote: Snippet.java //... final public class SolrQueryUtils { static public void runEntityQueries( DIHConfiguration config, SolrCore core, DataImporter importer, Entity entity, String entityName, SolrQueryRequest req, SolrQueryResponse rsp, Map< String , Object > params) throws Exception { Collection<Map< String , Object >> rows = new LinkedList<>(); EntityProcessorWrapper epw = null ; try { if (entityName == null || entity.getName().equals(entityName)) { epw = SolrQueryUtils.getEntityProcessorWrapperFrom( core, importer, entity, req, rsp, params ); if (epw != null ) { Map< String , Object > row = epw.nextRow(); //just for sanity, TODO: add a better way to do this final int MAX = 10000; int index = 0; while (row != null ) { if (index++ < MAX) { rows.add(row); } row = epw.nextRow(); } } } } catch (Exception ex) { ex.printStackTrace(); } finally { if (epw != null ) { epw.getDatasource().close(); epw.close(); epw.destroy(); } } if (entityName == null || entity.getName().equals(entityName)) { rsp.add(entity.getName()+ "_results" ,rows); } //potentially recurse if there are children Collection<Entity> children = entity.getChildren(); if (children != null ) { // for each child entity for (Entity child : children) { SolrQueryUtils.runEntityQueries(config,core,importer,child,entityName,req,rsp,params); } } } static public EntityProcessorWrapper getEntityProcessorWrapperFrom( SolrCore core, DataImporter importer, Entity entity, SolrQueryRequest req, SolrQueryResponse rsp, Map< String , Object > params) throws Exception { RequestInfo reqinfo = new RequestInfo(req,params, null ); DocBuilder docBuilder = importer.getDocBuilder( null ,reqinfo); EntityProcessorWrapper epw = docBuilder.getEntityProcessorWrapper(entity); VariableResolver vr = new VariableResolver(); vr.addNamespace( "dataimporter.request" ,params); Context ctx = new ContextImpl( epw, //EntityProcessorWrapper epw vr, //VariableResolver resolver null , //DataSource ds Context.FULL_DUMP, // String currProcess reqinfo.getRawParams(), //Map< String , Object > global null , //ContextImpl parentContext docBuilder //DocBuilder docBuilder ); DataSource<?> ds = importer.getDataSourceInstance( entity, entity.getDataSourceName(), ctx ); SolrQueryUtils.initDataSource(importer.getConfig(),core,entity,ds,ctx); epw.setDatasource(ds); epw.init(ctx); epw.setInitalized( true ); return epw; } static public void initDataSource( DIHConfiguration config, SolrCore core, Entity entity, DataSource<?> ds, Context ctx ) { //add all the properties from the core descriptor CoreDescriptor descriptor = core.getCoreDescriptor(); Map< String , Object > globals = new LinkedHashMap<>(); globals = SolrQueryUtils.addProperties(descriptor.getPersistableStandardProperties(), globals); globals = SolrQueryUtils.addProperties(descriptor.getPersistableUserProperties(), globals); //sort the keys for easier debugging globals = new TreeMap<>(globals); //make a new variable for the datasource properties resolver VariableResolver globalResolver = new VariableResolver(globals); Map< String , String > dsProps = config.getDataSources().get(entity.getDataSourceName()); Map< String , String > dsPropsResolved = new LinkedHashMap<>(); for (Map.Entry< String , String > entry : dsProps.entrySet()) { dsPropsResolved.put(entry.getKey(),globalResolver.replaceTokens(entry.getValue())); } Properties dsProperties = new Properties(); dsProperties.putAll(dsPropsResolved); ds.init(ctx,dsProperties); } static public Map< String , Object > addProperties(Properties p, Map< String , Object > map) { if (p != null && map != null ) { Enumeration<?> enumer = p.propertyNames(); while (enumer.hasMoreElements()) { String key = enumer.nextElement().toString(); String value = p.getProperty(key); map.put(key,value); } } return map; } } And, I have a custom data import handler that extends DataImportHandler. I added support for calling the data import handler with command='runquery', which ends up looking up the entities from the config, and calling the runEntityQueries method above. I'm not sure I'm gonna be able to write a test case since I'd assume that something would have to be mocked up in order for it to actually run? Let me know... Aaron
        Hide
        Aaron LaBella added a comment -

        NOTE: as you can see above ... actually getting the EntityProcessorWrapper, and initializing the datasource, seems like it is much more code than I would've hoped/expected for. I think the solr data import handler framework is an incredible piece of software, it's just unfortunate that the API's aren't a little more open and flexible to allow for extensibility and use-cases beyond just getting data into solr. I could envision using the code base to define adhoc queries that support things like variable resolution, evaluators, transformers, etc – it's very powerful stuff

        Thanks.

        Show
        Aaron LaBella added a comment - NOTE: as you can see above ... actually getting the EntityProcessorWrapper, and initializing the datasource, seems like it is much more code than I would've hoped/expected for. I think the solr data import handler framework is an incredible piece of software, it's just unfortunate that the API's aren't a little more open and flexible to allow for extensibility and use-cases beyond just getting data into solr. I could envision using the code base to define adhoc queries that support things like variable resolution, evaluators, transformers, etc – it's very powerful stuff Thanks.
        Hide
        ASF subversion and git services added a comment -

        Commit 1613406 from Erik Hatcher in branch 'dev/trunk'
        [ https://svn.apache.org/r1613406 ]

        SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements

        Show
        ASF subversion and git services added a comment - Commit 1613406 from Erik Hatcher in branch 'dev/trunk' [ https://svn.apache.org/r1613406 ] SOLR-3622 , SOLR-5847 , SOLR-6194 , SOLR-6269 : Several DIH fixes/improvements
        Hide
        ASF subversion and git services added a comment -

        Commit 1613409 from Erik Hatcher in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1613409 ]

        SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements (merged from r1613406)

        Show
        ASF subversion and git services added a comment - Commit 1613409 from Erik Hatcher in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1613409 ] SOLR-3622 , SOLR-5847 , SOLR-6194 , SOLR-6269 : Several DIH fixes/improvements (merged from r1613406)

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Aaron LaBella
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development