pyspark.sql.plot.core.PySparkPlotAccessor.scatter#
- PySparkPlotAccessor.scatter(x, y, **kwargs)[source]#
Create a scatter plot with varying marker point size and color.
The coordinates of each point are defined by two dataframe columns and filled circles are used to represent each point. This kind of plot is useful to see complex correlations between two variables. Points could be for instance natural 2D coordinates like longitude and latitude in a map or, in general, any pair of metrics that can be plotted against each other.
- Parameters
- xstr
Name of column to use as horizontal coordinates for each point.
- ystr or list of str
Name of column to use as vertical coordinates for each point.
- **kwargs: Optional
Additional keyword arguments.
- Returns
plotly.graph_objs.Figure
Examples
>>> from pyspark.sql import SparkSession >>> spark = SparkSession.builder.getOrCreate() >>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)] >>> columns = ['length', 'width', 'species'] >>> df = spark.createDataFrame(data, columns) >>> df.plot.scatter(x='length', y='width')