Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.0
-
None
Description
Implement the following two methods in DataFrameReader:
/** * Loads a `Dataset[String]` storing JSON objects (<a href="http://jsonlines.org/">JSON Lines * text format or newline-delimited JSON</a>) and returns the result as a `DataFrame`. * * Unless the schema is specified using `schema` function, this function goes through the * input once to determine the input schema. * * @param jsonDataset input Dataset with one JSON object per record * @since 3.4.0 */ def json(jsonDataset: Dataset[String]): DataFrame /** * Loads an `Dataset[String]` storing CSV rows and returns the result as a `DataFrame`. * * If the schema is not specified using `schema` function and `inferSchema` option is enabled, * this function goes through the input once to determine the input schema. * * If the schema is not specified using `schema` function and `inferSchema` option is disabled, * it determines the columns as string types and it reads only the first line to determine the * names and the number of fields. * * If the enforceSchema is set to `false`, only the CSV header in the first line is checked * to conform specified or inferred schema. * * @note if `header` option is set to `true` when calling this API, all lines same with * the header will be removed if exists. * * @param csvDataset input Dataset with one CSV row per record * @since 3.4.0 */ def csv(csvDataset: Dataset[String]): DataFrame
For this we need a new message. We cannot use project because we don't know the schema upfront.
message Parse { // (Required) Input relation to Parse. The input is expected to have single text column. Relation input = 1; // (Required) The expected format of the text. ParseFormat format = 2; enum ParseFormat { PARSE_FORMAT_UNSPECIFIED = 0; PARSE_FORMAT_CSV = 1; PARSE_FORMAT_JSON = 2; } }