Uploaded image for project: 'Apache Any23 (Retired)'
  1. Apache Any23 (Retired)
  2. ANY23-447

Reduce Any23 dependency bloat

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.3
    • None
    • core
    • None

    Description

      Compelled by email conversation with Hans Brende:

      David, unfortunately this move won't reduce the number of core dependencies
      we have: the plugins and service modules are not dependencies of the core
      module. However, it might be useful if you posted an issue about the
      dependency bloat, including the various exclusions you are using: we might
      be able to mitigate the problem.
      

      This was a result of having to exclude dependencies in the pom.xml for a product (Note that there was not too much thought in the exclusions, I was trying to get the code size down before a release). Section of pom.xml:

          <dependency>
            <groupId>org.apache.any23</groupId>
            <artifactId>apache-any23-core</artifactId>
              <exclusions>
                <!-- Any23 brings in a lot of dependencies which bloats the sharded jar. 
                     This is an attempt to reduce this by excluding packages
                     that we may not be using as part of Any23.
                     NOTE: If dependency is required at runtime, then a 
                     java.lang.NoClassDefFoundError is thrown.  -->
                
                <exclusion>
                  <groupId>org.apache.tika</groupId>
                  <artifactId>tika-parsers</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.bouncycastle</groupId>
                  <artifactId>bcmail-jdk15on</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.bouncycastle</groupId>
                  <artifactId>bcprov-jdk15on</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>edu.ucar</groupId>
                  <artifactId>cdm</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>net.sf.trove4j</groupId>
                  <artifactId>trove4j</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.apache.cxf</groupId>
                  <artifactId>cxf-rt-rs-client</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>com.github.ben-manes.caffeine</groupId>
                  <artifactId>caffeine</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.opengis</groupId>
                  <artifactId>geoapi</artifactId>
                </exclusion>  
                <exclusion>
                  <groupId>com.drewnoakes</groupId>
                  <artifactId>metadata-extractor</artifactId>
                </exclusion> 
                <exclusion>
                  <groupId>org.eclipse.rdf4j</groupId>
                  <artifactId>rdf4j-repository-sail</artifactId>
                </exclusion> 
                <exclusion>
                  <groupId>org.eclipse.rdf4j</groupId>
                  <artifactId>rdf4j-sail-memory</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.tukaani</groupId>
                  <artifactId>xz</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.codelibs</groupId>
                  <artifactId>jhighlight</artifactId>
                </exclusion> 
                <exclusion>
                  <groupId>org.gagravarr</groupId>
                  <artifactId>vorbis-java-core</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.gagravarr</groupId>
                  <artifactId>vorbis-java-tika</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.apache.opennlp</groupId>
                  <artifactId>opennlp-tools</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.apache.pdfbox</groupId>
                  <artifactId>pdfbox</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.apache.pdfbox</groupId>
                  <artifactId>pdfbox-tools</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.apache.poi</groupId>
                  <artifactId>poi-scratchpad</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>edu.ucar</groupId>
                  <artifactId>grib</artifactId>
                </exclusion>  
                <exclusion>
                  <groupId>com.googlecode.mp4parser</groupId>
                  <artifactId>isoparser</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>com.healthmarketscience.jackcess</groupId>
                  <artifactId>jackcess</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>com.healthmarketscience.jackcess</groupId>
                  <artifactId>jackcess-encrypt</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.apache.sis.core</groupId>
                  <artifactId>sis-utility</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.apache.sis.storage</groupId>
                  <artifactId>sis-netcdf</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.apache.sis.core</groupId>
                  <artifactId>sis-metadata</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.eclipse.rdf4j</groupId>
                  <artifactId>rdf4j-rio-trix</artifactId>
                </exclusion>
                <exclusion>
                  <groupId>org.yaml</groupId>
                  <artifactId>snakeyaml</artifactId>
                </exclusion>        
                <exclusion>
                  <groupId>org.eclipse.rdf4j</groupId>
                  <artifactId>rdf4j-rio-turtle</artifactId>
                </exclusion>         
              </exclusions>
          </dependency>
      

      Some background that may be useful from my notes:

      Whilst adding Any23 the product, the Any23 Core package was causing Lintian to fail.
      
      Lintian is a Debian package checker written in PERL. This package uses Archive::Zip to unpack any .jar file in the Debian package. This particular unzip utility does not handle the Zip64 format; causing the failure. The original zip format has various restrictions, one of which being the number of files in the archive. Therefore if the class files in the jar for the product exceeds this limit (65535), then a zip64 format file is produced instead of a standard zip file.
      
      The Any23 Core Library does seem quite excessive in what it pulls in. From running the following, the output for the product goes from 40490 to 78513.
      
      zipinfo -1 product.jar | wc -l
      

      This Linitan failure on a linux build was the original push for the exclusions; however the product .jar also increased in a similar fashion.

      Attachments

        1. output.txt
          36 kB
          Lewis John McGibbney

        Issue Links

          Activity

            People

              Unassigned Unassigned
              davidcockbill David Cockbill
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m