I think that the semantics in my patch are sufficient. You are correct that in some cases, they might be "closer together" than we might want, but what does that even mean? The semantics are not well specified. What if the optimizer in fact put C before B? What if the optimizer had them run at the same time? What if my cluster happens to be tuned to a certain workload...and so on and so on. I think as long as "now" is defined as "after the script runs," and as long as it is the same for every value in a given relation that uses it, that's the only guarantee that we can make. We can document this limitation (i.e. that "now" is a more or less arbitrary value in between the beginning of your script and when it is finished being parsed).
I suppose there would be some utility in a CurrentTime() where the time is with respect to the beginning of execution, but it could easily suffer from the same issue if it was in a foreach with a really time consuming value, where the "now" value quickly becomes stale. I think the incremental gain is minimal, and the incremental complexity is quite high. If you deeply disagree, though, we can discuss how to do it. I think the following would work: per each instantiation of the UDF, we create two unique files and put them in HDFS (I do not think the distributed cache will work in this specific case, but it may). Those files will be the constructor argument. On first execution, each mapper tries to delete the file. Since delete is atomic, only one should succeed. This is the leader. It will record the current time and serialize it to the second file. We would have to coordinate atomicity...perhaps it could write a magic value at the end of the serialized date time, so all of the mappers would read the file until they read the magic number, and then they'd know it was done.
This would be pretty complicated for what I see as a minimal gain, but it would probably be a "more correct" now() implementation. I do not know if Hadoop has a more convenient coordination mechanism between mappers (this sort of goes against the whole point).
I welcome more thoughts