Lucene - Core
  1. Lucene - Core
  2. LUCENE-3977

generated/duplicated javadocs are wasteful and bloat the release

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: general/javadocs
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Some stats for the generated javadocs of 3.6:

      • 9,146 files
      • 161,872 KB uncompressed
      • 25MB compressed (this is responsible for nearly half of our binary release)

      The fact we intentionally double our javadocs size with the 'javadocs-all' thing
      is truly wasteful and compression doesn't help at all. Just testing, i nuked 'all'
      and found:

      • 4,944 files
      • 81,084 KB uncompressed
      • 12.8MB compressed

      We need to clean this up for 4.0. We only need to ship javadocs 'one way'.

      1. LUCENE-3977-triplication.patch
        1.0 kB
        Robert Muir
      2. LUCENE-3977.patch
        9 kB
        Robert Muir
      3. LUCENE-3977.patch
        8 kB
        Robert Muir
      4. LUCENE-3977.patch
        19 kB
        Robert Muir
      5. LUCENE-3977.patch
        48 kB
        Robert Muir
      6. LUCENE-3977.patch
        54 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        We can save 10MB with this patch, which nukes the 'index'.
        I guarantee you nobody will miss it. Just click this thing and see how
        useless it is (since its every method etc in all of lucene).

        Index: common-build.xml
        ===================================================================
        --- common-build.xml	(revision 1310449)
        +++ common-build.xml	(working copy)
        @@ -996,6 +996,7 @@
                   encoding="${build.encoding}"
                   charset="${javadoc.charset}"
                   docencoding="${javadoc.charset}"
        +          noindex="true"
                   author="true"
                   version="true"
                   use="true"
        
        Show
        Robert Muir added a comment - We can save 10MB with this patch, which nukes the 'index'. I guarantee you nobody will miss it. Just click this thing and see how useless it is (since its every method etc in all of lucene). Index: common-build.xml =================================================================== --- common-build.xml (revision 1310449) +++ common-build.xml (working copy) @@ -996,6 +996,7 @@ encoding="${build.encoding}" charset="${javadoc.charset}" docencoding="${javadoc.charset}" + noindex="true" author="true" version="true" use="true"
        Hide
        Robert Muir added a comment -

        Besides the trivial patch above (which I think we should do), looking at the big picture
        with the 2x duplication, I really think we should totally nuke these javadocs-all tasks.

        Really if we have different modules like contrib-analyzers, why can't they link to
        the things they depend on (e.g. lucene-core) just like the solr javadocs do?

        This is just a matter of fixing the build system and then working towards making
        our downloads reasonable right?

        Show
        Robert Muir added a comment - Besides the trivial patch above (which I think we should do), looking at the big picture with the 2x duplication, I really think we should totally nuke these javadocs-all tasks. Really if we have different modules like contrib-analyzers, why can't they link to the things they depend on (e.g. lucene-core) just like the solr javadocs do? This is just a matter of fixing the build system and then working towards making our downloads reasonable right?
        Hide
        Hoss Man added a comment -

        Really if we have different modules like contrib-analyzers, why can't they link to the things they depend on (e.g. lucene-core) just like the solr javadocs do?

        i think the original argument in favor of having both styles was:

        • the all version makes it easy to see (in the left pane) all the classes that are available when people are working with the entire code base
        • the individual module versions, even when cross linked with eachother, make it easy to see exactly what is included in a single module (via the left pane)

        at this point in my life, i don't really have an opinion, as long as we include at least one copy in the bin release.

        We can save 10MB with this patch, which nukes the 'index

        oh god yes, i didn't even realize we were building that useless pile of crap

        Show
        Hoss Man added a comment - Really if we have different modules like contrib-analyzers, why can't they link to the things they depend on (e.g. lucene-core) just like the solr javadocs do? i think the original argument in favor of having both styles was: the all version makes it easy to see (in the left pane) all the classes that are available when people are working with the entire code base the individual module versions, even when cross linked with eachother, make it easy to see exactly what is included in a single module (via the left pane) at this point in my life, i don't really have an opinion, as long as we include at least one copy in the bin release. We can save 10MB with this patch, which nukes the 'index oh god yes, i didn't even realize we were building that useless pile of crap
        Hide
        Robert Muir added a comment -

        at this point in my life, i don't really have an opinion, as long as we include at least one copy in the bin release

        Well I was thinking, the javadocs-all really tries to be what the 'individual modules' version is! It tries (and fails) to separate packages according to contrib modules that "own" them, but this is all screwed up, sure o.a.l.index.pruning is marked underneath the pruning module but the PruningReader is actually in o.a.l.index.

        There are many other examples. So it seems to me that if they individual modules javadocs actually linked to each other correctly, then to the user it really gives us that same result, but without the duplication. Hell, if that still isn't good enough we could figure out some way to make 'massive useless class list' that links to all the correct places but I think thats not useful (thus, the crazy logic in build.xml to try to separate contribs into packages).

        oh god yes, i didn't even realize we were building that useless pile of crap

        Ok, I don't think this one is controversial: its an easy win, I'll commit it tonight or tomorrow (even though i still want to nuke the other 80MB from the duplication issue... thats harder)

        Show
        Robert Muir added a comment - at this point in my life, i don't really have an opinion, as long as we include at least one copy in the bin release Well I was thinking, the javadocs-all really tries to be what the 'individual modules' version is! It tries (and fails) to separate packages according to contrib modules that "own" them, but this is all screwed up, sure o.a.l.index.pruning is marked underneath the pruning module but the PruningReader is actually in o.a.l.index. There are many other examples. So it seems to me that if they individual modules javadocs actually linked to each other correctly, then to the user it really gives us that same result, but without the duplication. Hell, if that still isn't good enough we could figure out some way to make 'massive useless class list' that links to all the correct places but I think thats not useful (thus, the crazy logic in build.xml to try to separate contribs into packages). oh god yes, i didn't even realize we were building that useless pile of crap Ok, I don't think this one is controversial: its an easy win, I'll commit it tonight or tomorrow (even though i still want to nuke the other 80MB from the duplication issue... thats harder)
        Hide
        Robert Muir added a comment -

        OK this issue is way worse than i thought originally, in the binary release,
        we actually have 3x duplication for each module, lets take 'grouping as an example'

        1. the grouping javadocs in docs/api/all
        2. the grouping javadocs in docs/api/grouping
        3. the lucene-grouping-javadocs.jar

        This seems totally unnecessary and overkill to me. I dont think we need to include
        the lucene-grouping-javadocs.jar in the binary packaging, since we already provide
        it unzipped: it only needs to be in the maven artifacts. Providing it both unzipped
        and zipped bloats the release.

        As far as the javadocs-all stuff... I'm working on a patch

        Show
        Robert Muir added a comment - OK this issue is way worse than i thought originally, in the binary release, we actually have 3x duplication for each module, lets take 'grouping as an example' the grouping javadocs in docs/api/all the grouping javadocs in docs/api/grouping the lucene-grouping-javadocs.jar This seems totally unnecessary and overkill to me. I dont think we need to include the lucene-grouping-javadocs.jar in the binary packaging, since we already provide it unzipped: it only needs to be in the maven artifacts. Providing it both unzipped and zipped bloats the release. As far as the javadocs-all stuff... I'm working on a patch
        Hide
        Robert Muir added a comment -

        patch to remove the triplication (so its only duplication) by excluding these from the binary release (they only go into maven).

        Before:
        rw-rw-r- 1 rmuir rmuir 82046115 2012-04-18 01:10 lucene-4.0-SNAPSHOT.zip

        After:
        rw-rw-r- 1 rmuir rmuir 69982949 2012-04-18 01:15 lucene-4.0-SNAPSHOT.zip

        Show
        Robert Muir added a comment - patch to remove the triplication (so its only duplication) by excluding these from the binary release (they only go into maven). Before: rw-rw-r - 1 rmuir rmuir 82046115 2012-04-18 01:10 lucene-4.0-SNAPSHOT.zip After: rw-rw-r - 1 rmuir rmuir 69982949 2012-04-18 01:15 lucene-4.0-SNAPSHOT.zip
        Hide
        Robert Muir added a comment -

        here's a prototype (not ready for committing yet!)

        .zip binary release is reduced 12MB again:
        rw-rw-r- 1 rmuir rmuir 56883313 2012-04-20 09:53 lucene-4.0.0.zip

        .tgz (for reference):
        rw-rw-r- 1 rmuir rmuir 47958933 2012-04-20 09:53 lucene-4.0.0.tgz

        basically all i did here is the core/ dependency: but i can easily fix the others, and then solr.

        the idea is to remove javadocs-all, instead when you are in queries/ javadocs and click IndexReader it jumps you to core/.

        solr would keep its 'define-javadoc-url' stuff, except instead of it pointing to whatever/whatever/api/all/ it would just point to whatever/whatever/api and be used as the prefix.

        Show
        Robert Muir added a comment - here's a prototype (not ready for committing yet!) .zip binary release is reduced 12MB again: rw-rw-r - 1 rmuir rmuir 56883313 2012-04-20 09:53 lucene-4.0.0.zip .tgz (for reference): rw-rw-r - 1 rmuir rmuir 47958933 2012-04-20 09:53 lucene-4.0.0.tgz basically all i did here is the core/ dependency: but i can easily fix the others, and then solr. the idea is to remove javadocs-all, instead when you are in queries/ javadocs and click IndexReader it jumps you to core/. solr would keep its 'define-javadoc-url' stuff, except instead of it pointing to whatever/whatever/api/all/ it would just point to whatever/whatever/api and be used as the prefix.
        Hide
        Robert Muir added a comment -

        updated patch, that generates relative links (i wrongly had linkOffline=true set)

        Show
        Robert Muir added a comment - updated patch, that generates relative links (i wrongly had linkOffline=true set)
        Hide
        Robert Muir added a comment -

        sorry, i had attached the wrong patch.

        Show
        Robert Muir added a comment - sorry, i had attached the wrong patch.
        Hide
        Robert Muir added a comment -

        Updated patch that improves our docs/api/index.html (the javadocs-index.html task), to also include the project description (from the ant file).

        I also tried to clean up these descriptions and make them useful. I think the index.html looks much more useful now, and this is really a possible way we can nuke the manually-maintained forrest index under src/site in the future (but not here).

        Show
        Robert Muir added a comment - Updated patch that improves our docs/api/index.html (the javadocs-index.html task), to also include the project description (from the ant file). I also tried to clean up these descriptions and make them useful. I think the index.html looks much more useful now, and this is really a possible way we can nuke the manually-maintained forrest index under src/site in the future (but not here).
        Hide
        DM Smith added a comment -

        Personally I'd like them not to be in the binary release at all. I like when proj have separate d/l for Javadocs. Saves me from having to delete them. Or d/l them.

        Show
        DM Smith added a comment - Personally I'd like them not to be in the binary release at all. I like when proj have separate d/l for Javadocs. Saves me from having to delete them. Or d/l them.
        Hide
        Robert Muir added a comment -

        DM: well thats always an option, however I think on this issue the goal is
        simply to reduce from 3 copies (binary release) or 2 copies (website) of javadocs to only 1 copy.

        We can then open up a followup issue if we really want to exclude them from the binary release,
        however as our primary (basically only) form of documentation, I'm not sure about that one...

        But this issue won't make it any worse, only better.

        Show
        Robert Muir added a comment - DM: well thats always an option, however I think on this issue the goal is simply to reduce from 3 copies (binary release) or 2 copies (website) of javadocs to only 1 copy. We can then open up a followup issue if we really want to exclude them from the binary release, however as our primary (basically only) form of documentation, I'm not sure about that one... But this issue won't make it any worse, only better.
        Hide
        Uwe Schindler added a comment -

        If I am not interested in any documentation, I prefer to download the jar files directly from http://repo1.maven.org! If I want a complete distro, I download the binary one and expect javadocs to be there. [move this comment to the not-yet-created new issue]

        Show
        Uwe Schindler added a comment - If I am not interested in any documentation, I prefer to download the jar files directly from http://repo1.maven.org ! If I want a complete distro, I download the binary one and expect javadocs to be there. [move this comment to the not-yet-created new issue]
        Hide
        Dawid Weiss added a comment -

        It's funny – I feel the same way Uwe does but at the same time I absolutely never looked into off-line javadocs that I downloaded with distributions of open source projects. It's usually faster to just find these online.

        Show
        Dawid Weiss added a comment - It's funny – I feel the same way Uwe does but at the same time I absolutely never looked into off-line javadocs that I downloaded with distributions of open source projects. It's usually faster to just find these online.
        Hide
        Robert Muir added a comment -

        latest patch, found a few unrelated things:

        • join module links to package-private classes
        • MorfologikFilterFactory is in a solr/contrib, but the jar is added to lucene-libs (into the war) versus being plugged

        Lucene parts here should work. Solr is not ready yet, need to define a macro as before so these contribs/test-framework/etc can add their own links.

        Show
        Robert Muir added a comment - latest patch, found a few unrelated things: join module links to package-private classes MorfologikFilterFactory is in a solr/contrib, but the jar is added to lucene-libs (into the war) versus being plugged Lucene parts here should work. Solr is not ready yet, need to define a macro as before so these contribs/test-framework/etc can add their own links.
        Hide
        Robert Muir added a comment -

        patch with solr->lucene links working too.

        I think this is ready to commit.

        Show
        Robert Muir added a comment - patch with solr->lucene links working too. I think this is ready to commit.
        Hide
        Uwe Schindler added a comment -

        +1

        Show
        Uwe Schindler added a comment - +1
        Hide
        Ryan McKinley added a comment -

        +1 Thanks for taking on these thankless tasks!

        Show
        Ryan McKinley added a comment - +1 Thanks for taking on these thankless tasks!

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development