Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1526

[C++] parquet cpp - improve examples


    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: parquet-cpp
    • Labels:


      It would be a great to have examples of using parquet arrow high-level API for the following 2 cases

      • Storing nested data types (storing nested data types is touted as major merit of parquet, so I think this case should be included as an example). Ideally, an example of how to use arrow::StructArray nested with several primities types, list types and other nested types would cover every case of nested hierarchy of complex data representations
      • Buffered or Batched writes to parquet file. Parquet is meant to be used for large amounts of data. The current examples stores all of the data as in arrow data structures, before writing to parquet file, which has a huge memory footprint, proportional to the amount of data being stored. An example of writing directly to row groups and columns, can nicely demonstrate how to store data with smaller memory footprint. The current example creates a arrow::Table, which needs to be filled with arrow::Array(s) of entire data, size of which is bounded by the amount of RAM. Ideally, an example which generates some data in several arrow::Array(s), and then stores (appends) them as a new Row Group (or Column Chunk) in a parquet::arrow::FileWriter, using NewRowGroup and WriteColumnChunk functions, thus demonstrating a lower memory footprint for writing a parquet file with huge amounts of data




            • Assignee:
              rajeshwaragrawal101@gmail.com Rajeshwar Agrawal
            • Votes:
              1 Vote for this issue
              2 Start watching this issue


              • Created: