We are reading a very simple csv (see below).
The file is only 245 bytes so way below the default block_size in the ReadOptions. Thus we would expect the resulting table to have only one batch. At least, if I understand correctly that a block refers to the number of lines of certain byte size?
The docs state: This will determine multi-threading granularity as well as the size of individual chunks in the Table. For me, that means also the size of individual batches?
Previously, we thought by fixing the block_size to the total file size, we would ensure that even for files larger than 1MB we get a pa.Table with only one batch. This mini file seems to prove us wrong?
Additionally, if I convert back and forth to pandas we get only one batch.