To clarify, what would happen if Beeline uses the first 1000 rows to calculate the width, but then row 1001th is longer than that width.
If 1001th row has column larger than the precomputed column width, that particular row would have the column with larger width to accommodate it. This would mean some rows have the separator "|" out of alignment with previous row. However, even if we recompute every 1000 rows, we could still have misalignment every 1000 rows.
I looked at where the Row width gets used. The width is getting used only when --outputformat=table (ie TableOutputFormat class) is used .
If someone is working on very large outputs, it is likely to be processed by other applications and not human eyes, and a *sv (eg csv) format is likely to be used. It doesn't make any sense waste cpu cycles computing the width in those cases. This is also the case where performance impact of this computation would be more visible.
ie, If we can selectively enable buffering and width calculation only for TableOutputFormat, I don't think it would matter if we stick to column width based on first 1000 rows or recompute every 1000 rows.
Looks like the Row subclasses have access to beeline options and would be able to determine what the output format is.