databricks.koalas.Series.factorize

Series.factorize(sort: bool = True, na_sentinel: Optional[int] = - 1) → Tuple[databricks.koalas.series.Series, pandas.core.indexes.base.Index][source]

Encode the object as an enumerated type or categorical variable. This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values.

Parameters
sortbool, default True
na_sentinelint or None, default -1

Value to mark “not found”. If None, will not drop the NaN from the uniques of the values.

Returns
codesSeries

A Series that’s an indexer into uniques. uniques.take(codes) will have the same values as values.

uniquespd.Index

The unique valid values.

Note

Even if there’s a missing value in values, uniques will not contain an entry for it.

Examples

>>> kser = ks.Series(['b', None, 'a', 'c', 'b'])
>>> codes, uniques = kser.factorize()
>>> codes
0    1
1   -1
2    0
3    2
4    1
dtype: int32
>>> uniques
Index(['a', 'b', 'c'], dtype='object')
>>> codes, uniques = kser.factorize(na_sentinel=None)
>>> codes
0    1
1    3
2    0
3    2
4    1
dtype: int32
>>> uniques
Index(['a', 'b', 'c', None], dtype='object')
>>> codes, uniques = kser.factorize(na_sentinel=-2)
>>> codes
0    1
1   -2
2    0
3    2
4    1
dtype: int32
>>> uniques
Index(['a', 'b', 'c'], dtype='object')