Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1570

Add filtering capability to Datastore Queries

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Auto Closed
    • Affects Version/s: 2.2
    • Fix Version/s: 2.5
    • Component/s: REST_api, storage
    • Labels:
      None

      Description

      For some time this issue has been discussed on various lists.
      When doing the upgrade of the Gora dependencies in NUTCH-1569, I stumbled across a comment within o.a.n.api.DbReader#Iterator

        public Iterator<Map<String,Object>> iterator(String[] fields, String startKey, String endKey,
            String batchId) throws Exception {
          Query<String,WebPage> q = store.newQuery();
          String[] qFields = fields;
          if (fields != null) {
            HashSet<String> flds = new HashSet<String>(Arrays.asList(fields));
            // remove "url"
            flds.remove("url");
            if (flds.size() > 0) {
              qFields = flds.toArray(new String[flds.size()]);
            } else {
              qFields = null;
            }
          }
          q.setFields(qFields);
          if (startKey != null) {
            q.setStartKey(startKey);
            if (endKey != null) {
              q.setEndKey(endKey);
            }
          }
          Result<String,WebPage> res = store.execute(q);
          *XXX we should add the filtering capability to Query*
          return new DbIterator(res, fields, batchId);
        }
      

      I will link this issue to something over on Gora once we get around to the implementation.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              lewismc Lewis John McGibbney
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: