Uploaded image for project: 'Apache Nemo'
  1. Apache Nemo
  2. NEMO-366

DirectByteArrayOutputStream bug in SerializedPartition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Bug
    • None
    • None

    Description

      Currently, when fetching written data from bytesOutputStream to serializedData in SerializedPartition.java, there is a misuse of java.io.ByteArrayOutputStream. In Nemo, there is a DirectByteArrayOutputStream class which extends ByteArrayOutputStream, implemented to avoid copy overhead on serialization from Stream to byte array(https://github.com/snuspl/nemo/pull/638). The problem is that by doing this, we are violating the guidelines provided by ByteArrayOutputStream API to get correct results.

       

      Specifically, when fetching the current content of ByteArrayOutputStream, it is recommended to use toByteArray() because there may be invalid(blank) bytes in the byte array of the ByteArrayOutputStream. However in Nemo’s DirectByteArrayOutputStream, it uses getBufDirectly() to fetch the byte array directly, which may include invalid bytes. Consequentially, because it includes invalid bytes, when it tries to get the length of the data by using getCount() in DirectByteArrayOutputStream, the returned value does not match with the length of the fetched byte array. In summary, DirectByteArrayOutputStream’s getCount() returns the number of valid bytes in ByteArrayOutputStream, but current Nemo is using the data which also includes invalid bytes.

       

      So the solution may be:

      • Removing the misleading methods of DirectByteArrayOutputStream.java
      • replacing the usage of misleading methods in SerializedPartition.java to ByteArrayOutputStream’s API.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              haeyoon Haeyoon Cho
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m