Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12715

SnowflakeWrite fails in batch mode when the number of shards is > 1000

Details

    • Bug
    • Status: Triage Needed
    • P2
    • Resolution: Fixed
    • None
    • 2.33.0
    • io-java-snowflake
    • None

    Description

      When writing to Snowflake in batch mode, if the number of files to import is more than 1000, the load will fail

      From the Snowflake docs

      Of the three options for identifying/specifying data files to load from a stage, providing a discrete list of files is generally the fastest; however, the FILES parameter supports a maximum of 1,000 files, meaning a COPY command executed with the FILES parameter can only load up to 1,000 files.

      I noticed that the Snowflake Write in batch mode ignores the number of shards set by the user, and I think the first step should be to get the number of shards before writing.

      Longer term, should Beam issue multiple COPY statements with a distinct list of files when the number of files is more than 1000? Maybe inside the same transaction (BEGIN; END; block)

       

      Also, I wanted to set the Jira issue component as io-java-snowflake but it does not exist

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dmateusp Daniel Mateus Pires
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h