Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6805

Should the new field_ argument on Table.set_column() be optional? (Unnecessary breaking change)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Trivial
    • Resolution: Unresolved
    • Affects Version/s: 0.15.0
    • Fix Version/s: None
    • Component/s: Python
    • Labels:
    • Environment:
      All

      Description

      The new field_ argument (Table.set_column() on version 0.15.0) helps to add extra information on the column metadata. But it should not be mandatory, in some cases it is simply redundant like that:

       

      import pandas as pd
      import pyarrow as pa
      
      
      df = pd.DataFrame({"foo": [1, 2, 3]})
      tbl = pa.Table.from_pandas(df=df, preserve_index=False)
      
      if pa.__version__ == "0.15.0":
          field = pa.field(name="foo", type="double")
          tbl = tbl.set_column(0, field, tbl.column("foo").cast("double"))
      else:
          tbl = tbl.set_column(0, tbl.column("foo").cast("double"))
      

      I think that this argument should be optional to avoid redundant code and to keep compatibility with version <0.15.0.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              igorborgest Igor Tavares
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified