Solr
  1. Solr
  2. SOLR-3296

Explore alternatives to Commons CSV

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Build
    • Labels:
      None

      Description

      In LUCENE-3930 we're implementing some less than ideal solutions to make available the unreleased version of commons-csv. We could remove these solutions if we didn't rely on this lib. So I think we should explore alternatives.

      I think opencsv is an alternative to consider, I've used it in many commercial projects. Bizarrely Commons-CSV's website says that Opencsv uses a BSD license, but this isn't the case, OpenCSV uses ASL2.

      1. SOLR-3296_noggit.patch
        21 kB
        Robert Muir
      2. SOLR-3295-CSV-tests.patch
        64 kB
        Chris Male
      3. pom.xml
        1 kB
        Chris Male
      4. pom.xml
        2 kB
        Chris Male

        Activity

        Hide
        Dawid Weiss added a comment -

        BSD or ASL2 – either is fine with another ASL2 project.

        Show
        Dawid Weiss added a comment - BSD or ASL2 – either is fine with another ASL2 project.
        Hide
        Chris Male added a comment -

        Yeah I know. I was just pointing out that it used ASL.

        Show
        Chris Male added a comment - Yeah I know. I was just pointing out that it used ASL.
        Hide
        Uwe Schindler added a comment -

        What about apache-noggit? There are lots of other JSON parsers/generators available!

        Show
        Uwe Schindler added a comment - What about apache-noggit? There are lots of other JSON parsers/generators available!
        Hide
        Dawid Weiss added a comment -

        I used GSON (http://code.google.com/p/google-gson/) and was happy with it. It even contains sanity checks which come in handly if you're emitting insane data...

        Show
        Dawid Weiss added a comment - I used GSON ( http://code.google.com/p/google-gson/ ) and was happy with it. It even contains sanity checks which come in handly if you're emitting insane data...
        Hide
        Chris Male added a comment -

        I had originally intended to add noggit to this issue but there is some discussion about replacing it given how very efficient it is. Perhaps its a good idea to, as this issue says, explore alternatives to see whether something else meets our performance needs.

        Show
        Chris Male added a comment - I had originally intended to add noggit to this issue but there is some discussion about replacing it given how very efficient it is. Perhaps its a good idea to, as this issue says, explore alternatives to see whether something else meets our performance needs.
        Hide
        Uwe Schindler added a comment -

        I agree that noggit might be the most performant solution. The question is: why is there no release already. If its maintained by Yonik at ASF and sucessfully used in Solr, why not release the version we currently have in maven and use it? If Yonik thinks it's not ready for a release, we should not use it.

        Similar to what Dawid did, it took him a few hours to make the Carrot 1.5 stuff available via Maven repo.

        Show
        Uwe Schindler added a comment - I agree that noggit might be the most performant solution. The question is: why is there no release already. If its maintained by Yonik at ASF and sucessfully used in Solr, why not release the version we currently have in maven and use it? If Yonik thinks it's not ready for a release, we should not use it. Similar to what Dawid did, it took him a few hours to make the Carrot 1.5 stuff available via Maven repo.
        Hide
        Dawid Weiss added a comment -

        I didn't know it's Yonik's actually. It even has a pom.xml file – http://svn.apache.org/repos/asf/labs/noggit/?

        Yonik if you have an account at SonaType this takes as much as changing the revision number to something without a SNAPSHOT and an mvn deploy (plus accept from Nexus). Let me know if you need some guidance but it should be a 10 minute effort if you have the maven code ready.

        Show
        Dawid Weiss added a comment - I didn't know it's Yonik's actually. It even has a pom.xml file – http://svn.apache.org/repos/asf/labs/noggit/? Yonik if you have an account at SonaType this takes as much as changing the revision number to something without a SNAPSHOT and an mvn deploy (plus accept from Nexus). Let me know if you need some guidance but it should be a 10 minute effort if you have the maven code ready.
        Hide
        Steve Rowe added a comment -

        Minor nit about releasing noggit, which is hosted at Apache Labs: from http://labs.apache.org/bylaws.html:

        Guidelines

        [...]

        Releases

        Labs are prohibited from making releases.

        Show
        Steve Rowe added a comment - Minor nit about releasing noggit, which is hosted at Apache Labs: from http://labs.apache.org/bylaws.html : Guidelines [...] Releases Labs are prohibited from making releases.
        Hide
        Dawid Weiss added a comment -

        I guess this means "official apache releases" but if the release is done in a private namespace then this isn't a problem? I mean – I could probably take the source right now, change the group id to something I have access to (com.carrotsearch.thirdparty) and release it, but so can Yonik (under his own domain or whatever namespace he wishes that is different than Apache's)?

        I admit this is kind of weird that Solr is using something that cannot be officially released. Why not make it part of Solr then? Just copy the source code over and publish as a separate artefact?

        Show
        Dawid Weiss added a comment - I guess this means "official apache releases" but if the release is done in a private namespace then this isn't a problem? I mean – I could probably take the source right now, change the group id to something I have access to (com.carrotsearch.thirdparty) and release it, but so can Yonik (under his own domain or whatever namespace he wishes that is different than Apache's)? I admit this is kind of weird that Solr is using something that cannot be officially released. Why not make it part of Solr then? Just copy the source code over and publish as a separate artefact?
        Hide
        Yonik Seeley added a comment -

        First steps:
        https://github.com/yonik/noggit
        https://github.com/yonik/noggit/downloads

        wrt commons-csv alternatives, it's too risky for little/no gain. I put a lot of effort into getting commons-csv up to snuff, and almost all of the tests for that reside in commons-csv itself, not in Solr. Switching implementations would most likely result in a lot of regressions that we don't have tests for.

        ps: Steve, you're absolutely correct about the reason why there was never a separate noggit release. If github had been around in 2006, I might have chosen differently.

        Show
        Yonik Seeley added a comment - First steps: https://github.com/yonik/noggit https://github.com/yonik/noggit/downloads wrt commons-csv alternatives, it's too risky for little/no gain. I put a lot of effort into getting commons-csv up to snuff, and almost all of the tests for that reside in commons-csv itself, not in Solr. Switching implementations would most likely result in a lot of regressions that we don't have tests for. ps: Steve, you're absolutely correct about the reason why there was never a separate noggit release. If github had been around in 2006, I might have chosen differently.
        Hide
        Robert Muir added a comment -

        First steps:
        https://github.com/yonik/noggit
        https://github.com/yonik/noggit/downloads

        +1 !!!!!

        Is this safe to cutover to in trunk? I can do the ivy parts.

        Show
        Robert Muir added a comment - First steps: https://github.com/yonik/noggit https://github.com/yonik/noggit/downloads +1 !!!!! Is this safe to cutover to in trunk? I can do the ivy parts.
        Hide
        Yonik Seeley added a comment -

        Is this safe to cutover to in trunk?

        Yep, should be exactly the same code (just with different package names of course).

        Show
        Yonik Seeley added a comment - Is this safe to cutover to in trunk? Yep, should be exactly the same code (just with different package names of course).
        Hide
        Robert Muir added a comment -

        ok ill make a patch. of course maven is a separate issue, but ivy can just download that release...

        Show
        Robert Muir added a comment - ok ill make a patch. of course maven is a separate issue, but ivy can just download that release...
        Hide
        Chris Male added a comment -

        I put a lot of effort into getting commons-csv up to snuff, and almost all of the tests for that reside in commons-csv itself, not in Solr

        I'll bring the tests from common-csv into Solr.

        Show
        Chris Male added a comment - I put a lot of effort into getting commons-csv up to snuff, and almost all of the tests for that reside in commons-csv itself, not in Solr I'll bring the tests from common-csv into Solr.
        Hide
        Yonik Seeley added a comment -

        If the deal is about commons-csv not having a release yet, a much easier (and safer) path seems to just wait for them to do that and upgrade at that time.

        Show
        Yonik Seeley added a comment - If the deal is about commons-csv not having a release yet, a much easier (and safer) path seems to just wait for them to do that and upgrade at that time.
        Hide
        Michael McCandless added a comment -

        wrt commons-csv alternatives, it's too risky for little/no gain.

        This confuses me: commons-csv is unreleased, while there are other
        license-friendly packages (eg opencsv) that have been released for
        some time (multiple releases), been tested in the field, had bugs
        found & fixed, etc.

        Why use an unreleased package when released alternatives are
        available?

        I put a lot of effort into getting commons-csv up to snuff,

        Wait: a lot of effort doing what? Did you have to modify commons-csv
        sources? Or do you mean open issues w/ the commons devs to fix
        things/add test cases to commons-csv sources (great!)...?

        Switching implementations would most likely result in a lot of regressions that we don't have tests for.

        I'd expect the reverse, ie, it's more likely there are bugs in
        commons-csv (it's not released and thus not heavily tested) than eg
        in opencsv.

        And if somehow that's really the case (eg we have particular/unusual
        CSV parsing requirements), we should have our own tests asserting so?

        Show
        Michael McCandless added a comment - wrt commons-csv alternatives, it's too risky for little/no gain. This confuses me: commons-csv is unreleased, while there are other license-friendly packages (eg opencsv) that have been released for some time (multiple releases), been tested in the field, had bugs found & fixed, etc. Why use an unreleased package when released alternatives are available? I put a lot of effort into getting commons-csv up to snuff, Wait: a lot of effort doing what? Did you have to modify commons-csv sources? Or do you mean open issues w/ the commons devs to fix things/add test cases to commons-csv sources (great!)...? Switching implementations would most likely result in a lot of regressions that we don't have tests for. I'd expect the reverse, ie, it's more likely there are bugs in commons-csv (it's not released and thus not heavily tested) than eg in opencsv. And if somehow that's really the case (eg we have particular/unusual CSV parsing requirements), we should have our own tests asserting so?
        Hide
        Yonik Seeley added a comment -

        Wait: a lot of effort doing what?

        http://commons.apache.org/csv/team-list.html
        I became a CSV committer to address all of the issues.

        Show
        Yonik Seeley added a comment - Wait: a lot of effort doing what? http://commons.apache.org/csv/team-list.html I became a CSV committer to address all of the issues.
        Hide
        Robert Muir added a comment -

        patch for noggit: nuking the local copy of noggit (--no-diff-deleted), and using the download instead (changing package names to org.noggit where its used). all tests and javadocs pass.

        Show
        Robert Muir added a comment - patch for noggit: nuking the local copy of noggit (--no-diff-deleted), and using the download instead (changing package names to org.noggit where its used). all tests and javadocs pass.
        Hide
        Jan Høydahl added a comment -

        http://commons.apache.org/csv/team-list.html
        I became a CSV committer to address all of the issues.

        Great Yonik. As a CSV committer, could you not initiate a release? On the csv web site, it says:

        There are currently no official downloads, and will not be until CSV moves out of the Sandbox

        CSV has moved out of the Sandbox, so what stops you (the team) from taking the code as is and releasing it, perhaps as a 0.x version?

        Show
        Jan Høydahl added a comment - http://commons.apache.org/csv/team-list.html I became a CSV committer to address all of the issues. Great Yonik. As a CSV committer, could you not initiate a release? On the csv web site, it says: There are currently no official downloads, and will not be until CSV moves out of the Sandbox CSV has moved out of the Sandbox, so what stops you (the team) from taking the code as is and releasing it, perhaps as a 0.x version?
        Hide
        Chris Male added a comment -

        Patch for adding the commons-csv tests to trunk. Will commit shortly.

        Show
        Chris Male added a comment - Patch for adding the commons-csv tests to trunk. Will commit shortly.
        Hide
        Chris Male added a comment -

        While Robert's patch for getting Noggit from github does work with Ivy, it means we must also retrieve it with Maven. Can I be of help with getting a full Maven release of Noggit? Would it be preferred if I did it via a 3rd party release like I did with langdetect

        Show
        Chris Male added a comment - While Robert's patch for getting Noggit from github does work with Ivy, it means we must also retrieve it with Maven. Can I be of help with getting a full Maven release of Noggit? Would it be preferred if I did it via a 3rd party release like I did with langdetect
        Hide
        Michael McCandless added a comment -

        +1 to pull Noggit from it's official release, and stop using the source-copied version.

        Can someone who understands the Maven side do what's necessary here? Sonatype worked great for langdetect, I think?

        Show
        Michael McCandless added a comment - +1 to pull Noggit from it's official release, and stop using the source-copied version. Can someone who understands the Maven side do what's necessary here? Sonatype worked great for langdetect, I think?
        Hide
        Robert Muir added a comment -

        I keep threatening to commit that patch only because:

        • i think its more legit to have this real release than code-copied from apache labs. I think
          it undeniably makes our release more clean.
        • i left the patch up for a month already for someone to go thru whatever that process is
          to get it in maven.

        I don't actually follow thru on my threats YET because:

        • i worry someone will not do the right thing with maven, instead just revert back to
          fake release of other peoples stuff, which I helped work on to remove.
        • if someone does such a thing, i feel the maven artifacts are unreleasable, e.g.
          we are actually back in commons-csv state. So what would we do? exclude maven artifacts
          from any release candidate in this case and just everyone argues about it? or it falls
          back on the release manager to deal with?
        Show
        Robert Muir added a comment - I keep threatening to commit that patch only because: i think its more legit to have this real release than code-copied from apache labs. I think it undeniably makes our release more clean. i left the patch up for a month already for someone to go thru whatever that process is to get it in maven. I don't actually follow thru on my threats YET because: i worry someone will not do the right thing with maven, instead just revert back to fake release of other peoples stuff, which I helped work on to remove. if someone does such a thing, i feel the maven artifacts are unreleasable, e.g. we are actually back in commons-csv state. So what would we do? exclude maven artifacts from any release candidate in this case and just everyone argues about it? or it falls back on the release manager to deal with?
        Hide
        Chris Male added a comment - - edited

        This fell off my radar a little as I became distracted by other issues, but I'll prepare a release and submit it to Sonatype today.

        Show
        Chris Male added a comment - - edited This fell off my radar a little as I became distracted by other issues, but I'll prepare a release and submit it to Sonatype today.
        Hide
        Robert Muir added a comment -

        Chris: thank you!

        Show
        Robert Muir added a comment - Chris: thank you!
        Hide
        Chris Male added a comment -

        Attaching the POM that I will be using for the noggit release.

        Show
        Chris Male added a comment - Attaching the POM that I will be using for the noggit release.
        Hide
        Chris Male added a comment -

        Improved version.

        Show
        Chris Male added a comment - Improved version.
        Hide
        Chris Male added a comment -

        I have submitted it for processing, we'll see how things go.

        Show
        Chris Male added a comment - I have submitted it for processing, we'll see how things go.
        Hide
        Steve Rowe added a comment -

        Chris,

        I don't see org.noggit:noggit up on Maven Central yet, so I guess your request has hit a snag - do you know what's happening?

        Show
        Steve Rowe added a comment - Chris, I don't see org.noggit:noggit up on Maven Central yet, so I guess your request has hit a snag - do you know what's happening?
        Hide
        Chris Male added a comment -

        Funny you ask,

        When I submitted the bundle I received the same 'Staging Completed' notification as I did when I submitted langdetect. A relevant snippet from the email:

        The following artifacts have been staged to the Central Bundles-102 (u:MYUSERNAME, a:122.59.251.231) repository.
        

        with all the appropriate artifacts listed.

        Just today I received a 'Staging Repository Dropped' notification with only the following information:

        The Central Bundles-102 (u:MYUSERNAME, a:122.59.251.231) staging repository has been dropped.
        

        When langdetect was accepted, I received a 'Promotion Completed' email, so I think this is a bad sign but I've received no information about why it was rejected and don't know how to proceed further.

        Show
        Chris Male added a comment - Funny you ask, When I submitted the bundle I received the same 'Staging Completed' notification as I did when I submitted langdetect. A relevant snippet from the email: The following artifacts have been staged to the Central Bundles-102 (u:MYUSERNAME, a:122.59.251.231) repository. with all the appropriate artifacts listed. Just today I received a 'Staging Repository Dropped' notification with only the following information: The Central Bundles-102 (u:MYUSERNAME, a:122.59.251.231) staging repository has been dropped. When langdetect was accepted, I received a 'Promotion Completed' email, so I think this is a bad sign but I've received no information about why it was rejected and don't know how to proceed further.
        Hide
        Chris Male added a comment -

        After some research (thanks Steven), it seems the likely cause of the failure is that their repositories timeout after some period if they aren't synced to the central repository. Because I submitted the bundle on a Friday, it perhaps didn't get looked into until too late.

        So I've resubmitted the bundle (on a Monday now), fingers crossed.

        Show
        Chris Male added a comment - After some research (thanks Steven), it seems the likely cause of the failure is that their repositories timeout after some period if they aren't synced to the central repository. Because I submitted the bundle on a Friday, it perhaps didn't get looked into until too late. So I've resubmitted the bundle (on a Monday now), fingers crossed.
        Hide
        Steve Rowe added a comment -

        Noggit is now up on Maven Central.

        Show
        Steve Rowe added a comment - Noggit is now up on Maven Central .
        Hide
        Robert Muir added a comment -

        Thanks for the ping. I'll work up a new patch in a few days if nobody else wants to take it (dont hesitate),
        I'm just currently working on some other issues right now.

        Ivy parts should be pretty easy either way.

        Show
        Robert Muir added a comment - Thanks for the ping. I'll work up a new patch in a few days if nobody else wants to take it (dont hesitate), I'm just currently working on some other issues right now. Ivy parts should be pretty easy either way.
        Hide
        Chris Male added a comment -

        Noggit is now up on Maven Central.

        Yup I received notification today. So all I need to remember in the future is not to submit near the weekend.

        Show
        Chris Male added a comment - Noggit is now up on Maven Central. Yup I received notification today. So all I need to remember in the future is not to submit near the weekend.

          People

          • Assignee:
            Unassigned
            Reporter:
            Chris Male
          • Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development