I've recently hit many cases of regexp parsing where we need to match on something that is always arbitrary in length; for example, a text block that looks something like:
AAA:WORDS| BBB:TEXT| MSG:ASDF| MSG:QWER| ... MSG:ZXCV|
Where I need to pull out all values between "MSG:" and "|", which can occur in each instance between 1 and n times. I cannot reliably use the existing regexp_extract method since the number of occurrences is always arbitrary, and while I can write a UDF to handle this it'd be great if this was supported natively in Spark.
Perhaps we can implement something like regexp_extract_all as Presto and Pig have?