Uploaded image for project: 'Groovy'
  1. Groovy
  2. GROOVY-9589

Parse source codes in parallel

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1, 3.0.5
    • None
    • None

    Description

      Parrot parser parses large source code is much slower than the antlr2 parser because of the error alternative for better prompt[1]. In order to reduce the whole parsing time, we could parse all the source code in parallel.
      Use compiler option parallelParse or JVM option groovy.parallel.parse to enable/disable the improvement. For Groovy 3, the improvement is disabled by default, but for Groovy 4+, the improvement will be enabled.

      P.S.

      • Antlr2 can prompt missing right parethesis smartly without any error alternative, but antlr4 can not.
      • Parsing all groovy source code of nextflow[2] sequentially costs 64s on my machine. If parallelParse is enabled, just costs 43s, about 33% time reduced. Here is the script to measure the time costed:
      import groovy.io.FileType
      import org.codehaus.groovy.ast.ModuleNode
      import org.codehaus.groovy.control.CompilationUnit
      import org.codehaus.groovy.control.CompilerConfiguration
      import org.codehaus.groovy.control.Phases
      
      def parse(boolean parallelParse) {
      	def sourceFileList = []
      	new File('./src').eachFileRecurse (FileType.FILES) { file ->
      	  if (!file.name.endsWith('.groovy') && !file.name.endsWith('.gradle')) return
      	  
      	  sourceFileList << file
      	}
      
      	long elapsedTimeMillis
      	new CompilationUnit(new CompilerConfiguration(optimizationOptions: [parallelParse: parallelParse])).tap {
      		sourceFileList.each { f ->
      			addSource f
      		}
      		
      		def b = System.currentTimeMillis()
      		compile Phases.CONVERSION
      		def e = System.currentTimeMillis()
      		
      		elapsedTimeMillis = e - b
      	}
      	return elapsedTimeMillis
      }
      
      // def t = parse(false) // no parallel, costs 64s
      def t = parse(true) // in parallel, costs 43s
      println "# ${t / 1000}s elapsed"
      

       

      [1] https://github.com/apache/groovy/blob/GROOVY_3_0_4/src/antlr/GroovyParser.g4#L1259-L1261
      [2] https://github.com/nextflow-io/nextflow

      Attachments

        Activity

          People

            daniel_sun Daniel Sun
            daniel_sun Daniel Sun
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m