Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6231

[C++][Python] Consider assigning default column names when reading CSV file and header_rows=0

    XMLWordPrintableJSON

Details

    Description

      This is a slight usability rough edge. Assigning default names (like "f0, f1, ...") would probably be better since then at least you can see how many columns there are and what is in them.

      In [10]: parse_options = csv.ParseOptions(delimiter='|', header_rows=0)                                                                                         
      
      In [11]: %time table = csv.read_csv('Performance_2016Q4.txt', parse_options=parse_options)                                                                      
      ---------------------------------------------------------------------------
      ArrowInvalid                              Traceback (most recent call last)
      <timed exec> in <module>
      
      ~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/_csv.pyx in pyarrow._csv.read_csv()
      
      ~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
      
      ArrowInvalid: header_rows == 0 needs explicit column names
      

      In pandas integers are used, so some kind of default string would have to be defined

      In [18]: df = pd.read_csv('Performance_2016Q4.txt', sep='|', header=None, low_memory=False)                                                                     
      
      In [19]: df.columns                                                                                                                                             
      Out[19]: 
      Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
                  17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
                 dtype='int64')
      

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              wesm Wes McKinney
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 40m
                  4h 40m