pyspark.sql.functions.width_bucket#

pyspark.sql.functions.width_bucket(v, min, max, numBucket)[source]#

Returns the bucket number into which the value of this expression would fall after being evaluated. Note that input arguments must follow conditions listed below; otherwise, the method will return null.

New in version 3.5.0.

Parameters
vColumn or column name

value to compute a bucket number in the histogram

minColumn or column name

minimum value of the histogram

maxColumn or column name

maximum value of the histogram

numBucketColumn, column name or int

the number of buckets

Returns
Column

the bucket number into which the value would fall after being evaluated

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([
...     (5.3, 0.2, 10.6, 5),
...     (-2.1, 1.3, 3.4, 3),
...     (8.1, 0.0, 5.7, 4),
...     (-0.9, 5.2, 0.5, 2)],
...     ['v', 'min', 'max', 'n'])
>>> df.select("*", sf.width_bucket('v', 'min', 'max', 'n')).show()
+----+---+----+---+----------------------------+
|   v|min| max|  n|width_bucket(v, min, max, n)|
+----+---+----+---+----------------------------+
| 5.3|0.2|10.6|  5|                           3|
|-2.1|1.3| 3.4|  3|                           0|
| 8.1|0.0| 5.7|  4|                           5|
|-0.9|5.2| 0.5|  2|                           3|
+----+---+----+---+----------------------------+