In a nutshell, building a visualisation with the Grammar of Graphics boils down to the following:
A data set is visualised by mapping dimensions of the data to the visual properties of geometrical objects.
In this definition, the dimensions of the data are the columns in a data set. The geometrical objects are points, lines and shapes, and their visual properties (often called aesthetics in the language of the Grammar of Graphics) are things like position (in both the x and y direction), size and colour.
In this example, the country
variable is mapped to colour, the year
variable is mapped to the x position, and the cases
variable is mapped to the y position. Source: Maarten Lambrechts, CC BY SA 4.0
The rules to encode data into the visual properties of the geometrical objects are called scales. For example, the scales for x and y position geometries in the x and y direction, and a colour scale gives geometries certain colours.
Guides are chart elements that help viewers read values from a visualisation. In the case of position scales, the guides are the axes, in all other cases, guides are presented in the form of a legend (for example colour and size legends).
As with any other visualisation tool, making a visualisation with the Grammar of Graphics start with the data. The input data for a visualisation is almost always tabular data, with rows representing the records and columns representing the dimensions (also sometimes called measures, or fields) for each record. In the language of tidy data (see the Tidy data module), rows are called observations, and columns are variables.
Example of a data table, with each row representing a type of car and each column representing a variable measured on each car. Source: Maarten Lambrechts, CC BY SY 4.0
Most tools have functionality to load data from different sources and with different file formats, and to convert data into the format required for producing visualisations.
Variables can be of different types: they can be integers, continuous numerical values, categorical values and date/time stamps. The type of variable determines how it can be mapped to the aesthetics of geometrical objects. For example, it is not meaningful to use a categorical variable to the height of a bar, or using a continuous numerical variable to encode the shape of symbols.
Because of this, you need to make sure the tool you are using recognises the type of each variable in the data and parses its values correctly. If not, errors will be generated, or the visualisation process will lead to unexpected results.
Geometric objects (sometimes also called “geoms”, or “marks”) are the elements in a chart that carry the encoded data as visual properties. The geometries determine the resulting type of plot. For example with points as geometric objects, you can create a scatter plot, while with a line geometry you produce line charts.
Geometric objects can be divided into 3 main categories, based on their dimensionality:
The geometric objects available in Observable Plot. Source: observablehq.com/@observablehq/plot