Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-4199

Add nullability annotations to the methods and fields, ensure consistency with checkerframework

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.25.0
    • Fix Version/s: None
    • Component/s: None

      Description

      The current codebase uses jsr305 javax.annotation.Nullable / NonNull which helps to catch bugs while developing Calcite and libraries.

      Unfortunately, the current annotation is not enforced, and it lacks support for generics.
      jsr305.jar is probably abandoned (see https://github.com/google/guava/issues/2960), so we should probably migrate to something else.

      https://checkerframework.org/ is a solid framework for machine verification which is tailored to Java.

      The releases are quite frequent: https://github.com/typetools/checker-framework/releases
      Their annotations are recognized by major IDEs.

      So I see the following options:
      a) Leave the code as is
      b) Annotate (gradually?) the code with checkerframework annotations
      c) Migrate (gradually?) the code to Kotlin

      I've created a PR to verify which changes would be needed, verify if CI is able to type check in a reasonable time, and so on: https://github.com/apache/calcite/pull/1929

      My findings regarding checkerframework so far are:
      0) It does detect NPEs (which were hidden in the current code), and it does detect initialized files in the constructors
      1) Checkerframework takes ~2 minutes for :linq4j, and 9min for :core in GitHub Actions CI
      2) "non-nullable by default" is quite close to the current Calcite conventions.
      3) There are cases when javadoc comments says "or null", however, the code reads much easier if @Nullable nullable appears in the signature
      4) If a third-party library supplies invalid type annotations, there's a way to fix that. For instance, Guava's Function is annotated as "always nullable", and we can overrule that (so the nullability is taken from generic signature rather than "always nullable"). The override files are placed to src/main/config/checkerframework
      5) Generic-heavy code might be challenging (they are always like that), however, in the most obscure cases there's always a way to suppress the warning
      6) I've filed a Gradle improvement so it schedules recently modified files first for the compilation: https://github.com/gradle/gradle/issues/14332

      Option a: leave the code as is

      Typically, Calcite code does have decent documentation if null is permissible, however, I often need to check the existing calls in order to figure out if null is expected.

      Note: nullability does not mean just "null vs non-null" value, but it is related to other common pitfalls like "failing to initialize the value in constructor", "calling non-final method in the constructor"

      Option b: annotate (gradually?) the code with checkerframework annotations

      Pros:

      • Checkerframework verifies one method at a time, so the resulting code is easy to reason about. For instance, it requires that boolean equals(@Nullable Object other) overloads must explicitly declare @Nulable annotation. That might sound too verbose at first, however, it helps to read the code: you don't need to analyze class/interface declarations in order to figure out if nulls are expected or not.
      • Checkerframework supports nullability with generics. For instance, class Wrapper<T> means T can be used as either non-nullable or nullable. class Wrapper<T> means T is always nullable (all implementations can assume T is always nullable), and class Wrapper<T extends @NonNull Object> (nonnull is redundant in this case and can be omitted) means T can't be null.
      • Checkerframework supports nullability with arrays. @Nullable Object[] is non-nullable array with nullable elements, Object @Nullable [] is nullable array with non-nullable elements.
      • The verifier sees untested code
      • checkerframework has a coherent set of verifiers, and it is pluggable
      • checkerframework compiler understand lots of different nullability annotations, so we could use jsr305 nullability annotations instead of the ones from checkerframework
      • Nullability annotations might make it easier to contribute: one has to reason about nulls anyway, and the annotations make it simpler.

      Cons:

      • Nullability annotations add an extra burden on developers. Checkerframework documentation is good, however, it might be challenging to figure out the proper annotations especially when generics and/or inheritance is involved
      • Verification takes significant time, which is especially sad for local development. For instance, core verification takes ~10 minutes. Note: extra time is spent only in case checkerframework verification is activated, and the verification itself is NOT required for regular Calcite operation (e.g. development, testing, etc). The nullability verification is similar to code style checks like Checkstyle: it can be executed as a completely separate task.
      • IDEs do not have full support of checkerframework annotations. For instance, they can't infer nullability from generic signatures, so the code might look OK in IDE, and it would fail verification. At the same time, IDE might show false positives (warning in IDE while the verification passes)
      • Other languages might miss nullability inference from checkerframework annotations. For instance, Kotlin compiler can infer nullability from jsr305 annotations, however, it does not understand checkerframework yet
      • Checkerframework verifies one method at a time, so does not know if certain methods are used in a specific order only. One of the cases is Enumerator interface with current, moveNext, and close methods. Checker assumes the clients might call the methods in any order, so it enforces that null-checks must be placed in all the methods. It might sound like "adding verbosity to make checker happy"
      • Checkerframework is not suitable for test code, as it would take significant effort to write nullability annotations, and it would make test code much harder to write. For instance, checkerframework knows nothing on JUnit's @Before... annotations, so it assumes the fields are not initialized in the test methods
      • Nullability annotations might sound like adding a new language. For instance, the difference between <T> public T get(), <@Nullable T> public T get(), <T> public @Nullable T get(), <T extends @Nullable Object> public T get(), <@PolyNull T> public @PolyNull T get() might be not that easy to understand.
      • Java's type system is unsound, so it is hard to write code without suppression. For instance: Map.get(...) sometimes permits nulls (HashMap), and sometimes it does not (ImmutableMap). That is one of the reasons Map.get(...) and Map.remove(..) are annotated to require non-nullable argument.
      • Nullability annotations might make it harder to contribute: checkerframework might be unfamiliar to the contributors
      • Nullability annotations do not replace runtime checks, so we still need to use requireNonNull(...) to verify inputs in the runtime
      • There's a risk of human error while migrating the code: a wrong assertion might be added, so the code might start failing

      Option b: migrate (gradually?) the code to Kotlin

      Pros:

      • Better IDE support than the one of checkerframework
      • Standard library contains lots of collection manipulation code, so typical-for-Calcite Itarables.transform. Lists.transform, and so on would be easier to read and write. Streams in Java are still too verbose, especially for one-time mapping.
        For instance,
            return columns.stream()
                .map(columnOrdinal -> table.getRowType().getFieldNames()
                    .get(columnOrdinal))
                .collect(Collectors.toList());

        becomes

            return columns.map { columnOrdinal ->
                table.getRowType().getFieldNames().get(columnOrdinal)
            }
        

        or even

            return columns.map { table.getRowType().getFieldNames().get(it) }
        
      • Faster compilation times. Even though Kotlin is slower to compile than regular Java, Kotlin is way faster than checkerframework compiler. So edit-compile cycle would likely be much faster
      • Some type annotations would work for Kotlin as well. For instance, @CheckReturnValue would work in IDEs even for Kotlin code
      • It might attract contributors. StackOverflow 2020 survey suggests Kotlin is the 4th loved (and 6th wanted) programming language
      • Kotlin integrates with Java nicely: Java classes can extend Kotlin classes and vice versa
      • Kotlin is safe and pleasant to use for test code as well. For instance, lateinit var enables to have non-nullable reference and initialize it in @Before...
      • Migration to Kotlin can be fully backward compatible with Java (see OkHttp migration case)
      • Kotlin advanced deprecation features simplify migrations for the clients. For instance, @Deprecated annotation in Kotlin can suggest the replacements, so IDEs would suggest replacing the deprecations. With Java we have to guess the alternatives
      • Better "public API" support: Kotlin code fails to compile if a public class inherits from a private (or package-private) one.
      • Non-nullable arguments would automatically be verified, so hand-crafted Objects.requireNonNull(...) could be removed

      Cons:

      • Kotlin might make it harder to contribute: Kotlin might be unfamiliar to the contributors
      • There's a risk of human error while migrating the code: a wrong assertion might be added, so the code might start failing
      • Kotlin standard library becomes an extra 1MiB dependency

        Attachments

          Issue Links

          1.
          Unlock RelCrossType#getFieldCount() Sub-task Closed Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          2.
          Make RelDataType.getSqlTypeName non-nullable Sub-task Closed Vladimir Sitnikov

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          3.
          Ensure org.apache.calcite.schema.Statistic uses null vs emptyList appropriately Sub-task Open Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          4.
          Make org.apache.calcite.rel.type.RelDataType#getFamily non-nullable Sub-task Closed Vladimir Sitnikov

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          5.
          Clarify RelMetadataQuery#getDistribution nullability Sub-task Open Unassigned  
          6.
          Clarify SqlCall#getOperandList() nullability Sub-task Open Unassigned  
          7.
          Clarify if null elements are allowed in RelTraitSet Sub-task Open Unassigned  
          8.
          Add Mappings#asListNonNull as a null-safe alternative for Mappings#asList Sub-task Closed Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          9.
          ImmutableIntList#toArray(Integer[]) should support arguments larger than the collection itself Sub-task Closed Unassigned  
          10.
          FlatLists.Flat6List#append should not throw NPE if there are null elements in the list Sub-task Closed Unassigned  
          11.
          Add Util.throwAsRuntime and Util.causeOrSelf to simplify exception re-throwing Sub-task Closed Vladimir Sitnikov

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 10m
          12.
          org.apache.calcite.rex.RexProgram.Marshaller might need overriding visitSubQuery / visitTableInputRef / visitPatternFieldRef Sub-task Open Unassigned  
          13.
          RelBuilder.AggCallImpl#aggregateCall() throws NullPointerException Sub-task Open Julian Hyde  
          14.
          SqlValidatorImpl#validateGroupItem and validateGroupingSets are not using AggregatingSelectScope parameter Sub-task Open Unassigned  
          15.
          Clarify if nulls are permissible to return from .SqlOperator#inferReturnType Sub-task Open Unassigned  
          16.
          RelOptUtil#findAllTables should probably use RelMetadataQuery#getTableReferences Sub-task Open Unassigned  
          17.
          Clarify RelNode#estimateRowCount behavior with regards to nulls Sub-task Open Unassigned  
          18.
          org.apache.calcite.model.ModelHandler#checkRequiredAttributes should use Objects.requireNonNull Sub-task In Progress Unassigned  
          19.
          Replace jsr305 dependency with checkerframework and errorprone_annotations Sub-task In Progress Unassigned  
          20.
          Remove NonNull annotations Sub-task In Progress Unassigned  
          21.
          Refactor org.apache.calcite.rel.RelInput to provide null-safe getters Sub-task Open Unassigned  
          22.
          Make org.apache.calcite.jdbc.JavaTypeFactoryImpl#getJavaClass non-nullable Sub-task Open Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          23.
          ViewTableMacro#schemaPath is always initialized to non-null value, however, it is treated like nullable Sub-task Open Unassigned  
          24.
          JointNode in Interpreter might fail with NPE for FULL join Sub-task Open Unassigned  
          25.
          Clarify org.apache.calcite.sql.type.SqlTypeName#getFamily nullability Sub-task Open Unassigned  
          26.
          Add non-nullable experimental accessors side by side with their nullable counterparts Sub-task Open Unassigned  
          27.
          Avoid NPE when nulls are used in divide call Sub-task Open Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          28.
          NPE in TopDownRuleDriver#convert when convention is not enabled Sub-task Open Unassigned  
          29.
          JdbcRules JOIN_FACTORY, AGGREGATE_FACTORY, SET_OP_FACTORY, ... create relations with wrong convention Sub-task Closed Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          30.
          NPE possible win EnumerableWindow when agg.call.name is null Sub-task Open Unassigned  
          31.
          RelBuilder#scan(...) should work when used in RelRule#onMatch as call.builder() Sub-task Open Unassigned  
          32.
          TableMacro and TableFunction should have List<? extends Objects> rather than List<Object> parameters Sub-task Open Unassigned  
          33.
          Correct specification getCumulativeCost and getNonCumulativeCost so it should always produce non-nullable value Sub-task Open Unassigned  

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                vladimirsitnikov Vladimir Sitnikov
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 8h 40m
                  8h 40m