Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37980

Extend METADATA column to support row indices for file based data sources

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.4.0
    • SQL
    • None

    Description

      Spark recently added hidden metadata column support for File based datasources as part of  SPARK-37273.

      We should extend it to support ROW_INDEX/ROW_POSITION also.

       

      Meaning of  ROW_POSITION:

      ROW_INDEX/ROW_POSITION is basically an index of a row within a file. E.g. 5th row in the file will have ROW_INDEX 5.

       

      Use cases: 

      Row Indexes can be used in a variety of ways. A (fileName, rowIndex) tuple uniquely identifies row in a table. This information can be used to mark rows e.g. this can be used by indexer etc.

      Attachments

        Issue Links

          Activity

            People

              ala.luszczak Ala Luszczak
              prakharjain09 Prakhar Jain
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: