Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1993

[Python] Add function for determining implied Arrow schema from pandas.DataFrame

    XMLWordPrintableJSON

Details

    Description

      Currently the only option is to use Table/Array.from_pandas which does significant unnecessary work and allocates memory. If only the schema is of interest, then we could do less work and not allocate memory.

      We should provide the user a function pyarrow.Schema.from_pandas which takes a DataFrame as an input and returns the respective Arrow schema. The functionality for determing the schema is already available in the Python code, it is at moment just very tightly bound to the conversion infrastructure.

      Attachments

        Issue Links

          Activity

            People

              kszucs Krisztian Szucs
              wesm Wes McKinney
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m