pyspark.sql.functions.regexp_extract#
- pyspark.sql.functions.regexp_extract(str, pattern, idx)[source]#
Extract a specific group matched by the Java regex regexp, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned.
New in version 1.5.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- str
Column
or column name target column to work on.
- patternstr
regex pattern to apply.
- idxint
matched group id.
- str
- Returns
Column
matched value specified by idx group id.
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([('100-200',)], ['str']) >>> df.select('*', sf.regexp_extract('str', r'(\d+)-(\d+)', 1)).show() +-------+-----------------------------------+ | str|regexp_extract(str, (\d+)-(\d+), 1)| +-------+-----------------------------------+ |100-200| 100| +-------+-----------------------------------+
>>> df = spark.createDataFrame([('foo',)], ['str']) >>> df.select('*', sf.regexp_extract('str', r'(\d+)', 1)).show() +---+-----------------------------+ |str|regexp_extract(str, (\d+), 1)| +---+-----------------------------+ |foo| | +---+-----------------------------+
>>> df = spark.createDataFrame([('aaaac',)], ['str']) >>> df.select('*', sf.regexp_extract(sf.col('str'), '(a+)(b)?(c)', 2)).show() +-----+-----------------------------------+ | str|regexp_extract(str, (a+)(b)?(c), 2)| +-----+-----------------------------------+ |aaaac| | +-----+-----------------------------------+