Besides some automatic provenance events being emitted by the framework, the onus is on each processor to register provenance events and to provide a duration for the amount of time spent on the operation that results in the provenance event(s). However, supplying the duration is not mandatory, and if not provided, a value of -1 is stored to indicate that the duration is unknown.
Unfortunately there are many processors that do not track the operation time and thus do not store it in the provenance event. However, the time from when a flow file is fetched or created in a session to the time the provenance event is recorded is a legitimate measure of the time the flow file spent "active" in the session, which is the same as the duration of a provenance event.
This Jira proposes to record the "start" time of each flow file (more specifically, it's StandardRepositoryRecord), and upon session.commit(), record the duration of the flow file while active in the session into the provenance event(s), only if the duration has not already been supplied to the provenance events.