Users who convert Pandas string arrays to Arrow arrays may be surprised to see the Arrow ones use far more memory when the cardinality is low. The solution is for them to first convert to a Pandas Categorical, but it might save some headaches if we can automatically (or possibly with an option) detect when it's appropriate to use a Dictionary type over a String type.
Here's an example of what I'm talking about:
One bad consequence of inferring this automatically is if there is a sequence of Pandas DataFrames that are being converted, it's possible they may end up with differing schemas. For that reason it's likely this behavior should be optional.