Details

    • Type: Suitable Name Search
    • Status: Resolved
    • Resolution: Fixed
    • Podling:
      DRAT
    • Proposed Name:
      DRAT
    • Evidence Of Open Source Adoption:
      Hide
      drat.dyndns.org was established in 2014, and per the project history detailed here:
      https://wiki.apache.org/incubator/DRATProposal
      DRAT arose as an open source project originated through DARPA, NASA and NSF funding. It was born open source, and continues to be.

      Searches on GitHub for DRAT include chrismattmann/drat in the first page of results (2nd hit) of over 190 results. The results are broad, including the R CRAN template tool and a ton of its forks, a reading text analysis tool, rOpenSci tools, and other R related software that has nothing to do with code analysis.
      https://github.com/search?utf8=%E2%9C%93&q=drat&type=

      A google search result includes, since DRAT is a common word, everything ranging from Urban Dictionary to the R package for DRAT templates, and on the very first page, bottom link, a link to our project DRAT and a NASA tech briefs article written by Chris Mattmann:
      https://www.google.com/search?q=drat&oq=drat+&aqs=chrome..69i57j69i60l3j69i61j69i65.1095j0j4&sourceid=chrome&ie=UTF-8
      Show
      drat.dyndns.org was established in 2014, and per the project history detailed here: https://wiki.apache.org/incubator/DRATProposal DRAT arose as an open source project originated through DARPA, NASA and NSF funding. It was born open source, and continues to be. Searches on GitHub for DRAT include chrismattmann/drat in the first page of results (2nd hit) of over 190 results. The results are broad, including the R CRAN template tool and a ton of its forks, a reading text analysis tool, rOpenSci tools, and other R related software that has nothing to do with code analysis. https://github.com/search?utf8=%E2%9C%93&q=drat&type= A google search result includes, since DRAT is a common word, everything ranging from Urban Dictionary to the R package for DRAT templates, and on the very first page, bottom link, a link to our project DRAT and a NASA tech briefs article written by Chris Mattmann: https://www.google.com/search?q=drat&oq=drat+&aqs=chrome..69i57j69i60l3j69i61j69i65.1095j0j4&sourceid=chrome&ie=UTF-8
    • Evidence Of Registration:
      Hide
      5 dead marks
      1 related to drinking water
      1 related to cleaning products
      1 related to card games
      1 related to musical services advertising
      1 related to rodenticides (killing RATs).
      Show
      5 dead marks 1 related to drinking water 1 related to cleaning products 1 related to card games 1 related to musical services advertising 1 related to rodenticides (killing RATs).
    • Evidence Of Use On World Wide Web:
      Hide
      We have published DRAT results using the NSF Super Computer Wrangler for:

      1. All of Apache SVN repositories - http://drat.dyndns.org:8080/dratviz/
      2. 300+ repositories in the NSF EarthCube / geosciences community - http://drat.dyndns.org:8080/dratontoviz/
      3. DRAT is currently a GitHub project here, http://github.com/chrismattmann/drat
      Show
      We have published DRAT results using the NSF Super Computer Wrangler for: 1. All of Apache SVN repositories - http://drat.dyndns.org:8080/dratviz/ 2. 300+ repositories in the NSF EarthCube / geosciences community - http://drat.dyndns.org:8080/dratontoviz/ 3. DRAT is currently a GitHub project here, http://github.com/chrismattmann/drat
    • Docs Text:
      Hide
      As a part of the Apache Software Foundation (ASF) project, Apache Creadur™, a Release Audit Tool (RAT) was developed especially in response to demand from the Apache Software Foundation and its hundreds of projects to provide a capability for release auditing that could be integrated into projects. The primary function of the RAT is automated code auditing and open-source license analysis focusing on headers. RAT is a natural language processing tool written in Java to easily run on any platform and to audit code from many source languages (e.g., C, C++, Java, Python, etc.). RAT can also be used to add license headers to codes that are not licensed.

      In the summer of 2013, our team ran Apache RAT on source code produced from the Defense Advanced Research Projects Agency (DARPA) XDATA national initiative whose inception coincided with the 2012 U.S. Presidential Initiative in Big Data. XDATA brought together 24 performers across academia, private industry and the government to construct analytics, visualizations, and open source software mash-ups that were transitioned into government projects and to the defense sector. XDATA produced a large Git repository consisting of ~50,000 files and 10s of millions of lines of code. DARPA XDATA was launched to build a useful infrastructure for many government agencies and ultimately is an effort to avoid the traditional government-contractor software pipeline in which additional contracts are required to reuse and to unlock software previously funded by the government in other programs. All XDATA software is open source and is ingested into DARPA’s Open Catalog that points to outputs of the program including its source code and metrics on the repository. Because of this, one of core products of XDATA is the internal Git repository. Since XDATA brought together open source software across multiple performers, having an understanding of the licenses that the source codes used, and their compatibilities and differences was extremely important and since there repository was so large, our strategy was to develop an automated process using Apache RAT. We ran RAT on 24-core, 48 GB RAM Linux machine at the National Aeronautics and Space Administration (NASA)’s Jet Propulsion Laboratory (JPL) to produce a license evaluation of the XDATA Git repository and to provide recommendations on how the open source software products can be combined to adhere to the XDATA open source policy encouraging permissive licenses. Against our expectations, however, RAT failed to successfully and quickly audit XDATA’s large Git repository. Moreover, RAT provided no incremental output, resulting in solely a final report when a task was completed. RAT’s crawler did not automatically discern between binary file types and another file types. It seemed that RAT performed better by collecting similar sets of files together (e.g., all Javascript, all C++, all Java) and then running RAT jobs individually based on file types on smaller increments of files (e.g., 100 Java files at a time, etc). The lessons learned navigating these issues have motivated to create “DRAT”, which stands for "Distributed Release Audit Tool". DRAT directly overcomes RAT's limitations and brings code auditing and open source license analysis into the realm of Big Data using scalable open source Apache technologies. DRAT is already being applied and transitioned into the government agencies. DRAT currently exists at Github under the ALv2 under Chris Mattmann's GitHub account. Chris Mattmann was the PI of DARPA XDATA at JPL.
      Show
      As a part of the Apache Software Foundation (ASF) project, Apache Creadur™, a Release Audit Tool (RAT) was developed especially in response to demand from the Apache Software Foundation and its hundreds of projects to provide a capability for release auditing that could be integrated into projects. The primary function of the RAT is automated code auditing and open-source license analysis focusing on headers. RAT is a natural language processing tool written in Java to easily run on any platform and to audit code from many source languages (e.g., C, C++, Java, Python, etc.). RAT can also be used to add license headers to codes that are not licensed. In the summer of 2013, our team ran Apache RAT on source code produced from the Defense Advanced Research Projects Agency (DARPA) XDATA national initiative whose inception coincided with the 2012 U.S. Presidential Initiative in Big Data. XDATA brought together 24 performers across academia, private industry and the government to construct analytics, visualizations, and open source software mash-ups that were transitioned into government projects and to the defense sector. XDATA produced a large Git repository consisting of ~50,000 files and 10s of millions of lines of code. DARPA XDATA was launched to build a useful infrastructure for many government agencies and ultimately is an effort to avoid the traditional government-contractor software pipeline in which additional contracts are required to reuse and to unlock software previously funded by the government in other programs. All XDATA software is open source and is ingested into DARPA’s Open Catalog that points to outputs of the program including its source code and metrics on the repository. Because of this, one of core products of XDATA is the internal Git repository. Since XDATA brought together open source software across multiple performers, having an understanding of the licenses that the source codes used, and their compatibilities and differences was extremely important and since there repository was so large, our strategy was to develop an automated process using Apache RAT. We ran RAT on 24-core, 48 GB RAM Linux machine at the National Aeronautics and Space Administration (NASA)’s Jet Propulsion Laboratory (JPL) to produce a license evaluation of the XDATA Git repository and to provide recommendations on how the open source software products can be combined to adhere to the XDATA open source policy encouraging permissive licenses. Against our expectations, however, RAT failed to successfully and quickly audit XDATA’s large Git repository. Moreover, RAT provided no incremental output, resulting in solely a final report when a task was completed. RAT’s crawler did not automatically discern between binary file types and another file types. It seemed that RAT performed better by collecting similar sets of files together (e.g., all Javascript, all C++, all Java) and then running RAT jobs individually based on file types on smaller increments of files (e.g., 100 Java files at a time, etc). The lessons learned navigating these issues have motivated to create “DRAT”, which stands for "Distributed Release Audit Tool". DRAT directly overcomes RAT's limitations and brings code auditing and open source license analysis into the realm of Big Data using scalable open source Apache technologies. DRAT is already being applied and transitioned into the government agencies. DRAT currently exists at Github under the ALv2 under Chris Mattmann's GitHub account. Chris Mattmann was the PI of DARPA XDATA at JPL.

      Description

      Apache DRAT is a distributed, parallelized (Map Reduce) wrapper around Apache RAT™ (Release Audit Tool). RAT is used to check for proper licensing in software projects. However, RAT takes a prohibitively long time to analyze large repositories of code, since it can only run on one JVM. Furthermore, RAT isn't customizable by file type or file size and provides no incremental output. This wrapper dramatically speeds up the process by leveraging Apache OODT™ to parallelize and workflow the following components:

      Apache Solr™ based exploration of a CM repository (e.g., Git, SVN, etc.) and classification of that repository based on MIME type using Apache Tika™.
      A MIME partitioner that uses Apache Tika™ to automatically deduce and classify by file type and then partition Apache™ RAT jobs based on sets of 100 files per type (configurable) -- the M/R "partitioner"
      A throttle wrapper for RAT to MIME targeted Apache RAT™. -- the M/R "mapper"
      A reducer to "combine" the produced RAT logs together into a global RAT report that can be used for stats generation. -- the M/R "reducer"

        Attachments

        1. DRAT searc.png
          154 kB
          Chris A. Mattmann

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              chrismattmann Chris A. Mattmann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: