Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Bug
-
None
-
None
Description
Currently, when fetching written data from bytesOutputStream to serializedData in SerializedPartition.java, there is a misuse of java.io.ByteArrayOutputStream. In Nemo, there is a DirectByteArrayOutputStream class which extends ByteArrayOutputStream, implemented to avoid copy overhead on serialization from Stream to byte array(https://github.com/snuspl/nemo/pull/638). The problem is that by doing this, we are violating the guidelines provided by ByteArrayOutputStream API to get correct results.
Specifically, when fetching the current content of ByteArrayOutputStream, it is recommended to use toByteArray() because there may be invalid(blank) bytes in the byte array of the ByteArrayOutputStream. However in Nemo’s DirectByteArrayOutputStream, it uses getBufDirectly() to fetch the byte array directly, which may include invalid bytes. Consequentially, because it includes invalid bytes, when it tries to get the length of the data by using getCount() in DirectByteArrayOutputStream, the returned value does not match with the length of the fetched byte array. In summary, DirectByteArrayOutputStream’s getCount() returns the number of valid bytes in ByteArrayOutputStream, but current Nemo is using the data which also includes invalid bytes.
So the solution may be:
- Removing the misleading methods of DirectByteArrayOutputStream.java
- replacing the usage of misleading methods in SerializedPartition.java to ByteArrayOutputStream’s API.
Attachments
Issue Links
- blocks
-
NEMO-351 Empowering Nemo with fast I/O using Apache Crail
- Resolved
- links to