Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7061

[C++][Dataset] FileSystemDiscovery with ParquetFileFormat should ignore files that aren't Parquet

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      I got Invalid parquet file. Corrupt footer. trying to read real data. Turned out it was because I had opened the directory in macOS Finder and it had added the junk .DS_Store files. Once I deleted them, the Dataset created fine.

      If we're creating a DataSource with Parquet files, we should ignore any non-Parquet files we encounter when scanning.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            fsaintjacques Francois Saint-Jacques Assign to me
            npr Neal Richardson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 4h 50m
              4h 50m

              Slack

                Issue deployment