Solr
  1. Solr
  2. SOLR-3387

UpdateRequestHandler should support XML,CSV,JSON, and javabin

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Rather then have 4 handlers to support 4 content types, we should use a single endpoint and pick the ContentStreamLoader based on the ContentType

      This will simplify configuration problems for clients that want to swtich format (see SOLR-3038)

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          And default "wt" based on that? (i.e. xml, JSON or javabin)

          Show
          Yonik Seeley added a comment - And default "wt" based on that? (i.e. xml, JSON or javabin)
          Hide
          Ryan McKinley added a comment -

          yes – i'm looking at something like:

          private void setDefaultWT(String wt, SolrQueryRequest req) {
            SolrParams params = req.getParams();
            if( params.get(CommonParams.WT) == null ) {
              Map<String,String> map = new HashMap<String,String>(1);
              map.put(CommonParams.WT, wt);
              req.setParams(SolrParams.wrapDefaults(params, 
                  new MapSolrParams(map)));
            }
          }
          

          The big change would be that content-type is important. So far only the charset is used. I think this is an OK change for 4.x and is inline with any REST service

          Show
          Ryan McKinley added a comment - yes – i'm looking at something like: private void setDefaultWT( String wt, SolrQueryRequest req) { SolrParams params = req.getParams(); if ( params.get(CommonParams.WT) == null ) { Map< String , String > map = new HashMap< String , String >(1); map.put(CommonParams.WT, wt); req.setParams(SolrParams.wrapDefaults(params, new MapSolrParams(map))); } } The big change would be that content-type is important. So far only the charset is used. I think this is an OK change for 4.x and is inline with any REST service
          Hide
          Ryan McKinley added a comment -

          as a quick sketch, I'm looking at something like:

          return new ContentStreamLoader() {
                XMLLoader xml = null;
                JavabinLoader javabin = null;
                JsonLoader json = null;
                CSVLoader csv = null;
                
                @Override
                public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream) throws Exception {
                  ContentStreamLoader loader = null;
                  String type = stream.getContentType();
                  if (type.contains("javabin")) {
                    if (javabin == null) {
                      javabin = new JavabinLoader(processor);
                      setDefaultWT("javabin", req);
                    }
                    loader = javabin;
                  } else if (type.contains("xml")) {
                    if (xml == null) {
                      xml = new XMLLoader(processor, inputFactory);
                      setDefaultWT("xml", req);
                    }
                    loader = xml;
                  } else if (type.contains("json")) {
                    if (json == null) {
                      json = new JsonLoader(req, processor);
                      setDefaultWT("json", req);
                    }
                    loader = json;
                  } else if (type.contains("csv")) {
                    if (csv == null) {
                      csv = new SingleThreadedCSVLoader(req, processor);
                      // setDefaultWT("csv", req); Should this default?
                    }
                    loader = csv;
                  }
                  
                  if (loader == null) {
                    throw new SolrException(ErrorCode.BAD_REQUEST,
                        "Unsupported Content-Type: '" + type + "'");
                  }
                  loader.load(req, rsp, stream);
                }
          
                private void setDefaultWT(String wt, SolrQueryRequest req) {
                  SolrParams params = req.getParams();
                  if( params.get(CommonParams.WT) == null ) {
                    Map<String,String> map = new HashMap<String,String>(1);
                    map.put(CommonParams.WT, wt);
                    req.setParams(SolrParams.wrapDefaults(params, 
                        new MapSolrParams(map)));
                  }
                }
              };
          

          Any red flags? We could have more strict content-type rules

          If we like the general idea/approach I'll clean things up with tests etc.

          For back compatibility any opinions?

          • @Deprecated JsonUpdateRequestHandler could simply extend the general UpdateRequestHandler (now requiring proper content-type)
          • @Deprecated JsonUpdateRequestHandler could could call JsonLoader explicitly (same as 3.x)
          • remove it completely
          Show
          Ryan McKinley added a comment - as a quick sketch, I'm looking at something like: return new ContentStreamLoader() { XMLLoader xml = null ; JavabinLoader javabin = null ; JsonLoader json = null ; CSVLoader csv = null ; @Override public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream) throws Exception { ContentStreamLoader loader = null ; String type = stream.getContentType(); if (type.contains( "javabin" )) { if (javabin == null ) { javabin = new JavabinLoader(processor); setDefaultWT( "javabin" , req); } loader = javabin; } else if (type.contains( "xml" )) { if (xml == null ) { xml = new XMLLoader(processor, inputFactory); setDefaultWT( "xml" , req); } loader = xml; } else if (type.contains( "json" )) { if (json == null ) { json = new JsonLoader(req, processor); setDefaultWT( "json" , req); } loader = json; } else if (type.contains( "csv" )) { if (csv == null ) { csv = new SingleThreadedCSVLoader(req, processor); // setDefaultWT( "csv" , req); Should this default ? } loader = csv; } if (loader == null ) { throw new SolrException(ErrorCode.BAD_REQUEST, "Unsupported Content-Type: '" + type + "'" ); } loader.load(req, rsp, stream); } private void setDefaultWT( String wt, SolrQueryRequest req) { SolrParams params = req.getParams(); if ( params.get(CommonParams.WT) == null ) { Map< String , String > map = new HashMap< String , String >(1); map.put(CommonParams.WT, wt); req.setParams(SolrParams.wrapDefaults(params, new MapSolrParams(map))); } } }; Any red flags? We could have more strict content-type rules If we like the general idea/approach I'll clean things up with tests etc. For back compatibility any opinions? @Deprecated JsonUpdateRequestHandler could simply extend the general UpdateRequestHandler (now requiring proper content-type) @Deprecated JsonUpdateRequestHandler could could call JsonLoader explicitly (same as 3.x) remove it completely
          Hide
          Yonik Seeley added a comment -

          Any red flags?

          What about stream.body? And how does this play with multi-part uploads?

          It would also be nice to be able to leave off the "-H 'Content-type:application/json'" stuff when we do curl examples... but curl
          defaults to form encoded. I wonder if we could do some sort of auto-detect in the case of form-encoded... starting with "<" would be XML,
          starting with "["or"{" would be JSON.

          Show
          Yonik Seeley added a comment - Any red flags? What about stream.body? And how does this play with multi-part uploads? It would also be nice to be able to leave off the "-H 'Content-type:application/json'" stuff when we do curl examples... but curl defaults to form encoded. I wonder if we could do some sort of auto-detect in the case of form-encoded... starting with "<" would be XML, starting with "["or"{" would be JSON.
          Hide
          Ryan McKinley added a comment -

          What about stream.body?

          stream.body is used to create the ContentStream – so it does not change anything

          And how does this play with multi-part uploads?

          The one difference is that this would let you have different content types in each part – not really a feature, but worth noting.

          I wonder if we could do some sort of auto-detect

          I'll poke – getStream().mark()/reset() can probably work for XML/JSON but it may break things for javabin

          Show
          Ryan McKinley added a comment - What about stream.body? stream.body is used to create the ContentStream – so it does not change anything And how does this play with multi-part uploads? The one difference is that this would let you have different content types in each part – not really a feature, but worth noting. I wonder if we could do some sort of auto-detect I'll poke – getStream().mark()/reset() can probably work for XML/JSON but it may break things for javabin
          Hide
          Yonik Seeley added a comment -

          I'll poke – getStream().mark()/reset() can probably work for XML/JSON but it may break things for javabin

          Yeah, support doesn't need to be universal or anything... I think this would really be only for easier curl from the command line (i.e. not important for javabin)

          Show
          Yonik Seeley added a comment - I'll poke – getStream().mark()/reset() can probably work for XML/JSON but it may break things for javabin Yeah, support doesn't need to be universal or anything... I think this would really be only for easier curl from the command line (i.e. not important for javabin)
          Hide
          Ryan McKinley added a comment -

          Since solr 1.1, the content-type is required for post body

          curl http://localhost:8983/solr/update --data-binary @books.json
          

          returns 400 'missing content stream'

          We could try to auto detect the content type, but that is unrelated to this issue. To support this, we would want a different approach in SolrRequestParsers. see line 394 in SolrRequestParsers

          Yonik do you want to open a new issue for trying to auto-detect content-type?

          Show
          Ryan McKinley added a comment - Since solr 1.1, the content-type is required for post body curl http: //localhost:8983/solr/update --data-binary @books.json returns 400 'missing content stream' We could try to auto detect the content type, but that is unrelated to this issue. To support this, we would want a different approach in SolrRequestParsers. see line 394 in SolrRequestParsers Yonik do you want to open a new issue for trying to auto-detect content-type?
          Hide
          Jan Høydahl added a comment -

          There is already a JIRA for this, let's close this one

          Show
          Jan Høydahl added a comment - There is already a JIRA for this, let's close this one
          Hide
          Jan Høydahl added a comment -

          Closing. Please continue discussion in SOLR-2857

          Show
          Jan Høydahl added a comment - Closing. Please continue discussion in SOLR-2857

            People

            • Assignee:
              Unassigned
              Reporter:
              Ryan McKinley
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development