By default PDFTextStripper has it's "shouldSeparateByBeads" attribute set to "true" which means that it will try to extract the text flowing from one column to another as contiguous text. Thus it will extract/render the text from column 1 first followed by the text for column 2.
If you set that flag to 'false', the stripper will try to extract the beads in rendered order, 'rendering' the vertically correlated lines from each column side by side — i.e, in the same line.
However the text extraction does not currently demark when the text in the line is no longer in the first bead and now coming from the 2nd. So currently it is not possible to tell which words in the line came from which column.
The writePage() code detects a gap in a line of words and inserts the singleton WordSeparator object between words. When the text is 'rendered' it is replaced with the return value of the 'getWordSeparator()' method (which can be modified using the 'setWordSeparator(String)' method). It may be possible to do something similar with detecting the bead change.
I.E. - if we detect that we just incremented the bead count since the last insert of a WordSeparator, we could also insert a 'BeadSeparator'. We could then similarly instrument the ability to customize what string is used to render the BeadSeparator (it would default to be an empty string to maintain the current behavior).
I unfortunately do not have time to work on this myself right now. If someone else wants to run with this idea and try to implement it, that would be cool.
For most users, the default behavior of 'shouldSeparateByBeads==true' accomplishes what is needed because it tries to keep the text logically contiguous. Are you sure this isn't what you want?