Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8058

[C++][Python][Dataset] Provide an option to toggle validation and schema inference in FileSystemDatasetFactoryOptions

    XMLWordPrintableJSON

Details

    Description

      This can be costly and is not always necessary.

      At the same time we could move file validation into the scan tasks; currently all files are inspected as the dataset is constructed, which can be expensive if the filesystem is slow. We'll be performing the validation multiple times but the check will be cheap since at scan time we'll be reading the file into memory anyway.

      Attachments

        Issue Links

          Activity

            People

              fsaintjacques Francois Saint-Jacques
              bkietz Ben Kietzman
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3.5h
                  3.5h