Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-5764

Puffin, an Awk for Java



    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.35.0
    • None
    • None


      Create Puffin, which allows a programming model similar to the awk scripting language.

      An awk program is a collection of rules, each of which is a pair: a predicate and an action. For each line in a file, the rules are applied in sequence, and if the predicate evaluates to true, the action is executed. Then awk goes on to the next file.

      Here is a simple awk script that counts the number of non-comment lines in a file:

      /^#/ {
      END {
        printf("counter: %d\n", n);

      Here is the equivalent Puffin program:

          Puffin.Program<Unit> program =
              Puffin.builder(() -> Unit.INSTANCE, u -> new AtomicInteger())
                  .add(line -> !line.startsWith("#"),
                      line -> line.state().incrementAndGet())
                  .after(context ->
                      context.println("counter: " + context.state().get()))

      In Puffin, each predicate is a Predicate<Line>>, and each action is a Consumer<Line>. Line is a data structure that gives access to the text of the line, regular expression matching, and file-local and global state.

      Puffin allows thread-safe parallel processing of multiple files (or more generally sources, including URLs). File-local state is allocated by a factory, and each file is processed in a single thread. Therefore rules do not need to coordinate with rules processing other files.

      Global state is also allocated by a factory, but it is shared, and rules must coordinate when they access it. In the above example, u -> new AtomicInteger() is the factory that creates global state.


        Issue Links



              julianhyde Julian Hyde
              julianhyde Julian Hyde
              0 Vote for this issue
              3 Start watching this issue