databricks.koalas.DataFrame.spark.apply¶
-
spark.
apply
(func, index_col: Union[str, List[str], None] = None) → ks.DataFrame¶ Applies a function that takes and returns a Spark DataFrame. It allows natively apply a Spark function and column APIs with the Spark column internally used in Series or Index.
Note
set index_col and keep the column named as so in the output Spark DataFrame to avoid using the default index to prevent performance penalty. If you omit index_col, it will use default index which is potentially expensive in general.
Note
it will lose column labels. This is a synonym of
func(kdf.to_spark(index_col)).to_koalas(index_col)
.- Parameters
- funcfunction
Function to apply the function against the data by using Spark DataFrame.
- Returns
- DataFrame
- Raises
- ValueErrorIf the output from the function is not a Spark DataFrame.
Examples
>>> kdf = ks.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, columns=["a", "b"]) >>> kdf a b 0 1 4 1 2 5 2 3 6
>>> kdf.spark.apply( ... lambda sdf: sdf.selectExpr("a + b as c", "index"), index_col="index") ... c index 0 5 1 7 2 9
The case below ends up with using the default index, which should be avoided if possible.
>>> kdf.spark.apply(lambda sdf: sdf.groupby("a").count().sort("a")) a count 0 1 1 1 2 1 2 3 1