Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6625

[Python] Allow concat_tables to null or default fill missing columns

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The concat_tables function currently requires schemas to be identical across all tables to be concat'ed together. However, tables occasionally are conforming on type where present, but a column will be absent.

      In this case, allowing for null filling (or default filling) would be ideal.

      I imagine this feature would be an optional parameter on the concat_tables function. Presumably the argument could be either a boolean in the case of blanket null filling, or a mapping type for default filling. If a user wanted to default fill some columns, but null fill others, they could use a None as the value (defaultdict would make it simple to provide a blanket null fill if only a few default value columns were desired).

      If a mapping wasn't present, the function should probably raise an error.

      The default behavior would be the current and thus the default value of the parameter should be False or None.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            brillsp Zhuo Peng Assign to me
            nugend Daniel Nugent
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 11h
              11h

              Slack

                Issue deployment