ManifoldCF
  1. ManifoldCF
  2. CONNECTORS-50

Proposal for initial two releases of LCF, including packaged product and full API

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: ManifoldCF 0.3
    • Component/s: None
    • Labels:
      None

      Description

      Currently, LCF has a relatively high-bar for evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications.

      A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues.

      Goals:

      1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today)
      2. LCF as a toolkit for developers needing customized crawling and repository access
      3. An API-based crawling engine that can be integrated with applications (as Aperture is today)

      Larger goals:

      1. Make it very easy for users to evaluate LCF.
      2. Make it very easy for developers to customize LCF.
      3. Make it very easy for appplications to fully manage and control LCF in operation.

      Two phases:

      1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5.
      2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0.

      Phase 1
      -------

      LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later.
      It would contain roughly the features that are currently in place or currently underway, plus a little more.

      Specifically, LCF 0.5 would contain these additional capabilities:

      1. Plug-in architecture for connectors (CONNECTORS-40 - DONE)
      2. Packaged app ready to run with embedded Jetty app server (CONNECTORS-59)
      3. Bundled with database - PostgreSQL or derby - ready to run without additional manual setup (CONNECTORS-55)
      4. Mini-API to initially configure default connections and "example" jobs for file system and web crawl (CONNECTORS-58)
      5. Agent process started automatically (CONNECTORS-60)
      6. Solr output connector option to commit at end of job, by default (CONNECTORS-57)

      Installation and basic evaluation of LCF would be essentially as simple as Solr is today. The example
      connections and jobs would permit the user to initiate example crawls of a file system example
      directory and an example web on the LCF web site with just a couple of clicks (as opposed to the
      detailed manual setup required today to create repository and output connections and jobs.

      It is worth considering whether the SharePoint connector could also be included as part of the default package.

      Users could then add additional connectors and repositories and jobs as desired.

      Timeframe for release? Level of effort?

      Phase 2
      -------

      The essence of Phase 2 is that LCF would be split to allow direct, full API access to LCF as a
      crawling "engine", in additional to the full LCF UI. Call this LCF 1.0.

      Specifically, LCF 1.0 would contain these additional capabilities:

      1. Full API for LCF as a crawling engine (CONNECTORS-56)
      2. LCF can be bundled within an app (CONNECTORS-61)
      3. LCF event and activity notification for full control by an application (CONNECTORS-41)

      Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, plus whatever bug
      fixes and minor enhancements might also be added.

      Timeframe for release? Level of effort?

      -------------------------

      Issues:

      • Can we package PostgreSQL with LCF so LCF can set it up?
      • Or do we need Derby for that purpose?
      • Managing multiple processes (UI, database, agent, app processes)
      • What exactly would the API look like? (URL, XML, JSON, YAML?)

        Activity

        Jack Krupansky created issue -
        Jack Krupansky made changes -
        Field Original Value New Value
        Original Estimate 5m [ 300 ] 3,360h [ 12096000 ]
        Remaining Estimate 5m [ 300 ] 3,360h [ 12096000 ]
        Description Currently, LCF has a relatively high-bar or evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications.

        A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues.

        Goals:

        1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today)
        2. LCF as a toolkit for developers needing customized crawling and repository access
        3. An API-based crawling engine that can be integrated with applications (as Aperture is today)

        Larger goals:

        1. Make it very easy for users to evaluate LCF.
        2. Make it very easy for developers to customize LCF.
        3. Make it very easy for appplications to fully manage and control LCF in operation.

        Two phases:

        1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5.
        2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0.


        Phase 1
        -------

        LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later.
        It would contain roughly the features that are currently in place or currently underway, plus a little more.

        Specifically, LCF 0.5 would contain these additional capabilities:

        1. Plug-in architecture for connectors (already underway)
        2. Packaged app ready to run with embedded Jetty app server (I think this has been agreed to)
        3. Bundled with database - PostgreSQL or derby - ready to run without additional manual setup
        4. Mini-API to initially configure default connections and "example" jobs for file system and web crawl
        5. Agent process started automatically (platform-specific startup required)
        6. Solr output connector option to commit at end of job, by default

        Installation and basic evaluation of LCF would be essentially as simple as Solr is today. The example
        connections and jobs would permit the user to initiate example crawls of a file system example
        directory and an example web on the LCF web site with just a couple of clicks (as opposed to the
        detailed manual setup required today to create repository and output connections and jobs.

        It is worth considering whether the SharePoint connector could also be included as part of the default package.

        Users could then add additional connectors and repositories and jobs as desired.

        Timeframe for release? Level of effort?

        Phase 2
        -------

        The essence of Phase 2 is that LCF would be split to allow direct, full API access to LCF as a
        crawling "engine", in additional to the full LCF UI. Call this LCF 1.0.

        Specifically, LCF 1.0 would contain these additional capabilities:

        1. Full API for LCF as a crawling engine
        2. LCF can be bundled within an app (such as the default LCF package itself with its UI)
        3. LCF event and activity notification for full control by an application (already a Jira request)

        Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, plus whatever bug
        fixes and minor enhancements might also be added.

        Timeframe for release? Level of effort?

        -------------------------

        Issues:

        - Can we package PostgreSQL with LCF so LCF can set it up?
          - Or do we need Derby for that purpose?
        - Managing multiple processes (UI, database, agent, app processes)
        - What exactly would the API look like? (URL, XML, JSON, YAML?)
        Currently, LCF has a relatively high-bar for evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications.

        A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues.

        Goals:

        1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today)
        2. LCF as a toolkit for developers needing customized crawling and repository access
        3. An API-based crawling engine that can be integrated with applications (as Aperture is today)

        Larger goals:

        1. Make it very easy for users to evaluate LCF.
        2. Make it very easy for developers to customize LCF.
        3. Make it very easy for appplications to fully manage and control LCF in operation.

        Two phases:

        1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5.
        2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0.


        Phase 1
        -------

        LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later.
        It would contain roughly the features that are currently in place or currently underway, plus a little more.

        Specifically, LCF 0.5 would contain these additional capabilities:

        1. Plug-in architecture for connectors (already underway)
        2. Packaged app ready to run with embedded Jetty app server (I think this has been agreed to)
        3. Bundled with database - PostgreSQL or derby - ready to run without additional manual setup
        4. Mini-API to initially configure default connections and "example" jobs for file system and web crawl
        5. Agent process started automatically (platform-specific startup required)
        6. Solr output connector option to commit at end of job, by default

        Installation and basic evaluation of LCF would be essentially as simple as Solr is today. The example
        connections and jobs would permit the user to initiate example crawls of a file system example
        directory and an example web on the LCF web site with just a couple of clicks (as opposed to the
        detailed manual setup required today to create repository and output connections and jobs.

        It is worth considering whether the SharePoint connector could also be included as part of the default package.

        Users could then add additional connectors and repositories and jobs as desired.

        Timeframe for release? Level of effort?

        Phase 2
        -------

        The essence of Phase 2 is that LCF would be split to allow direct, full API access to LCF as a
        crawling "engine", in additional to the full LCF UI. Call this LCF 1.0.

        Specifically, LCF 1.0 would contain these additional capabilities:

        1. Full API for LCF as a crawling engine
        2. LCF can be bundled within an app (such as the default LCF package itself with its UI)
        3. LCF event and activity notification for full control by an application (already a Jira request)

        Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, plus whatever bug
        fixes and minor enhancements might also be added.

        Timeframe for release? Level of effort?

        -------------------------

        Issues:

        - Can we package PostgreSQL with LCF so LCF can set it up?
          - Or do we need Derby for that purpose?
        - Managing multiple processes (UI, database, agent, app processes)
        - What exactly would the API look like? (URL, XML, JSON, YAML?)
        Hide
        Karl Wright added a comment -

        I don't think much of "umbrella tickets". Each ticket should describe a reasonably isolated feature or fix, not a wish list. Can you break this up into more specific work items, being careful to check first whether there are existing tickets covering the feature/service you are looking for?

        I'm also still looking for much greater specificity as to the use cases. One cannot design useful features without use cases. For example, the word "API" is so unspecific as to be essentially meaningless. If you describe in detail what your hoped-for interaction with this hypothetical "API" is, that would go a long way towards clarifying the need. I'm not just interested in the API format; I'm interested in how you intend to interact with it. This is crucial because, as I've pointed out in various posts, one key design goal of LCF is to make the connector developer provide the UI for their connector, and your proposal may well force a violation of that principle, unless you have something clever up your sleeve.

        There are still a number of points in your document we have discussed in the past which remain but whose controversy goes unacknowledged. It would be good, if you create tickets or add to tickets already created, to mention the associated issues and why you think they are unimportant or immaterial. For example, I've discussed the limitations of using Derby as the prime database for LCF - that should be captured somewhere.

        Show
        Karl Wright added a comment - I don't think much of "umbrella tickets". Each ticket should describe a reasonably isolated feature or fix, not a wish list. Can you break this up into more specific work items, being careful to check first whether there are existing tickets covering the feature/service you are looking for? I'm also still looking for much greater specificity as to the use cases. One cannot design useful features without use cases. For example, the word "API" is so unspecific as to be essentially meaningless. If you describe in detail what your hoped-for interaction with this hypothetical "API" is, that would go a long way towards clarifying the need. I'm not just interested in the API format; I'm interested in how you intend to interact with it. This is crucial because, as I've pointed out in various posts, one key design goal of LCF is to make the connector developer provide the UI for their connector, and your proposal may well force a violation of that principle, unless you have something clever up your sleeve. There are still a number of points in your document we have discussed in the past which remain but whose controversy goes unacknowledged. It would be good, if you create tickets or add to tickets already created, to mention the associated issues and why you think they are unimportant or immaterial. For example, I've discussed the limitations of using Derby as the prime database for LCF - that should be captured somewhere.
        Hide
        Jack Krupansky added a comment -

        I expect to be able to address all of Karl's points...

        > I don't think much of "umbrella tickets"... Can you break this up into more specific work items...

        I'll be doing that over the coming week or so. I'll keep this umbrella ticket not for details, but just to show how all of the individual tickets fit together. The discussion on this ticket is more for the overall proposal for two separate releases and roughly what they are.

        > I'm also still looking for much greater specificity as to the use cases.

        I'll provide some of that for each individual ticket. I'll try to keep the use cases as simple and minimalist as possible, but I'll address specific questions or issues that arise.

        > the word "API" is so unspecific as to be essentially meaningless... I'm interested in how you intend to interact with it.

        Initially I'll be relatively light on detail to permit others to have some input on what they expect from a full API, but eventually all of the API issues will need to be flushed out and detailed to some extent.

        > There are still a number of points ... we have discussed in the past which remain but whose controversy goes unacknowledged.

        Yes, with the proposed commit feature as an example. The specific ticket for each feature should address such concerns.

        > I've discussed the limitations of using Derby as the prime database for LCF - that should be captured somewhere.

        Yes. There might be several database tickets. One for alternate databases. Another for bundling the database with LCF.

        Show
        Jack Krupansky added a comment - I expect to be able to address all of Karl's points... > I don't think much of "umbrella tickets"... Can you break this up into more specific work items... I'll be doing that over the coming week or so. I'll keep this umbrella ticket not for details, but just to show how all of the individual tickets fit together. The discussion on this ticket is more for the overall proposal for two separate releases and roughly what they are. > I'm also still looking for much greater specificity as to the use cases. I'll provide some of that for each individual ticket. I'll try to keep the use cases as simple and minimalist as possible, but I'll address specific questions or issues that arise. > the word "API" is so unspecific as to be essentially meaningless... I'm interested in how you intend to interact with it. Initially I'll be relatively light on detail to permit others to have some input on what they expect from a full API, but eventually all of the API issues will need to be flushed out and detailed to some extent. > There are still a number of points ... we have discussed in the past which remain but whose controversy goes unacknowledged. Yes, with the proposed commit feature as an example. The specific ticket for each feature should address such concerns. > I've discussed the limitations of using Derby as the prime database for LCF - that should be captured somewhere. Yes. There might be several database tickets. One for alternate databases. Another for bundling the database with LCF.
        Hide
        Jack Krupansky added a comment -

        Identify Jira issue number for every individual task, all nine of them.

        Show
        Jack Krupansky added a comment - Identify Jira issue number for every individual task, all nine of them.
        Jack Krupansky made changes -
        Original Estimate 3,360h [ 12096000 ]
        Remaining Estimate 3,360h [ 12096000 ]
        Description Currently, LCF has a relatively high-bar for evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications.

        A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues.

        Goals:

        1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today)
        2. LCF as a toolkit for developers needing customized crawling and repository access
        3. An API-based crawling engine that can be integrated with applications (as Aperture is today)

        Larger goals:

        1. Make it very easy for users to evaluate LCF.
        2. Make it very easy for developers to customize LCF.
        3. Make it very easy for appplications to fully manage and control LCF in operation.

        Two phases:

        1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5.
        2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0.


        Phase 1
        -------

        LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later.
        It would contain roughly the features that are currently in place or currently underway, plus a little more.

        Specifically, LCF 0.5 would contain these additional capabilities:

        1. Plug-in architecture for connectors (already underway)
        2. Packaged app ready to run with embedded Jetty app server (I think this has been agreed to)
        3. Bundled with database - PostgreSQL or derby - ready to run without additional manual setup
        4. Mini-API to initially configure default connections and "example" jobs for file system and web crawl
        5. Agent process started automatically (platform-specific startup required)
        6. Solr output connector option to commit at end of job, by default

        Installation and basic evaluation of LCF would be essentially as simple as Solr is today. The example
        connections and jobs would permit the user to initiate example crawls of a file system example
        directory and an example web on the LCF web site with just a couple of clicks (as opposed to the
        detailed manual setup required today to create repository and output connections and jobs.

        It is worth considering whether the SharePoint connector could also be included as part of the default package.

        Users could then add additional connectors and repositories and jobs as desired.

        Timeframe for release? Level of effort?

        Phase 2
        -------

        The essence of Phase 2 is that LCF would be split to allow direct, full API access to LCF as a
        crawling "engine", in additional to the full LCF UI. Call this LCF 1.0.

        Specifically, LCF 1.0 would contain these additional capabilities:

        1. Full API for LCF as a crawling engine
        2. LCF can be bundled within an app (such as the default LCF package itself with its UI)
        3. LCF event and activity notification for full control by an application (already a Jira request)

        Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, plus whatever bug
        fixes and minor enhancements might also be added.

        Timeframe for release? Level of effort?

        -------------------------

        Issues:

        - Can we package PostgreSQL with LCF so LCF can set it up?
          - Or do we need Derby for that purpose?
        - Managing multiple processes (UI, database, agent, app processes)
        - What exactly would the API look like? (URL, XML, JSON, YAML?)
        Currently, LCF has a relatively high-bar for evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications.

        A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues.

        Goals:

        1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today)
        2. LCF as a toolkit for developers needing customized crawling and repository access
        3. An API-based crawling engine that can be integrated with applications (as Aperture is today)

        Larger goals:

        1. Make it very easy for users to evaluate LCF.
        2. Make it very easy for developers to customize LCF.
        3. Make it very easy for appplications to fully manage and control LCF in operation.

        Two phases:

        1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5.
        2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0.


        Phase 1
        -------

        LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later.
        It would contain roughly the features that are currently in place or currently underway, plus a little more.

        Specifically, LCF 0.5 would contain these additional capabilities:

        1. Plug-in architecture for connectors (CONNECTORS-40 - DONE)
        2. Packaged app ready to run with embedded Jetty app server (CONNECTORS-59)
        3. Bundled with database - PostgreSQL or derby - ready to run without additional manual setup (CONNECTORS-55)
        4. Mini-API to initially configure default connections and "example" jobs for file system and web crawl (CONNECTORS-58)
        5. Agent process started automatically (CONNECTORS-60)
        6. Solr output connector option to commit at end of job, by default (CONNECTORS-57)

        Installation and basic evaluation of LCF would be essentially as simple as Solr is today. The example
        connections and jobs would permit the user to initiate example crawls of a file system example
        directory and an example web on the LCF web site with just a couple of clicks (as opposed to the
        detailed manual setup required today to create repository and output connections and jobs.

        It is worth considering whether the SharePoint connector could also be included as part of the default package.

        Users could then add additional connectors and repositories and jobs as desired.

        Timeframe for release? Level of effort?

        Phase 2
        -------

        The essence of Phase 2 is that LCF would be split to allow direct, full API access to LCF as a
        crawling "engine", in additional to the full LCF UI. Call this LCF 1.0.

        Specifically, LCF 1.0 would contain these additional capabilities:

        1. Full API for LCF as a crawling engine (CONNECTORS-56)
        2. LCF can be bundled within an app (CONNECTORS-61)
        3. LCF event and activity notification for full control by an application (CONNECTORS-41)

        Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, plus whatever bug
        fixes and minor enhancements might also be added.

        Timeframe for release? Level of effort?

        -------------------------

        Issues:

        - Can we package PostgreSQL with LCF so LCF can set it up?
          - Or do we need Derby for that purpose?
        - Managing multiple processes (UI, database, agent, app processes)
        - What exactly would the API look like? (URL, XML, JSON, YAML?)
        Hide
        Karl Wright added a comment -

        Moving this out of core, since it's a planning ticket not a software issue.

        Show
        Karl Wright added a comment - Moving this out of core, since it's a planning ticket not a software issue.
        Karl Wright made changes -
        Component/s Framework core [ 12313440 ]
        Hide
        Karl Wright added a comment -

        I'm going to resolve this ticket, since the planning part is now meaningless, and the only thing that remains is a scripting language, for which there is a separate ticket.

        Show
        Karl Wright added a comment - I'm going to resolve this ticket, since the planning part is now meaningless, and the only thing that remains is a scripting language, for which there is a separate ticket.
        Hide
        Karl Wright added a comment -

        Resolved, leaving scripting language request open as a separate ticket.

        Show
        Karl Wright added a comment - Resolved, leaving scripting language request open as a separate ticket.
        Karl Wright made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Karl Wright [ kwright@metacarta.com ]
        Fix Version/s ManifoldCF 0.3 [ 12316324 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Karl Wright
            Reporter:
            Jack Krupansky
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development