Details
Description
The method Column.substr() is 1-based, conforming with SQL and Hive's SUBSTRING, and contradicting both Python's substr and Java's substr, which are zero-based. Both PySpark users and Java API users often naturally expect a 0-based substr(). Adding to the confusion, substr() currently allows a startPos value of 0, which returns the same result as startPos==1.
Since changing substr() to 0-based is probably NOT a reasonable option here, I suggest making one or more of the following changes:
- Adding a method substr0, which would be zero-based
- Renaming substr to substr1
- Making the existing substr() throw an exception on startPos==0, which should catch and alert most users who expect zero-based behavior.
This is my first discussion on this project, apologies for any faux pas.