 ## 2D Scatter Plot

2D Scatter plot is one of the simple and very useful plotting tool used in Exploratory Data Analysis. A 2D Scatter plot would take the data points in our dataset for the two-axis and would plot it on a chart. The position of a point depends on its two-dimensional value, where each value is a position on either the horizontal or vertical dimension. This plotting of points is called scatter plot because we are scattering all the points from our dataset on a map.

## A 2D Scatter Plot Would Primarily Consists Of Three Components:

• X axis
• Y axis
• Scale of the plot

However, we can also add a different color if we would like to classify or even add sizes to our markers to see some complexity in our data.

It is very important to have all the three basic components in order to have a proper understanding of plot. After looking at the X and Y axis care should be taken to understand the scale too because it might be a case that the point of origin might be (0,0). If we are including different colours for classification we should add labels to understand which colour represent which label.

Scatter plots are used to understand the relationships between variables. For example, we try to understand whether there is any relationship between the increase or decrease in the value of one variable.

Take the example of the result of Maths for a particular class. If we have the number of hours each student has studied on an average or cumulative on Y-axis and the marks they have received of x-axis then it will help us see the trend and the shift in the scores from the minimum to maximum score. Definitely, correlation does not lead to causation but with good domain knowledge, it will help us lead on the right path.

A simple 2D Scatter plot will help us understand the endpoint and the concentration of the points on a map for the two-axis under consideration. However, if we have a color label and would like to understand how it fares vis-à-vis these two points we can do a special color coding for them to understand whether there is a particular pattern which these class labels or dependent variables follow. In such a case, we should have labels to understand what color represents which label. A very widely used example for this is the iris dataset where we can not only see the cluster of ‘Setosa’ species is separately clustered when we color code all the three species of flower.