Building blocks of the Grammar of Graphics

Overview

In a nutshell, building a visualisation with the Grammar of Graphics boils down to the following:

A data set is visualised by mapping dimensions of the data to the visual properties of geometrical objects.

In this definition, the dimensions of the data are the columns in a data set. The geometrical objects are points, lines and shapes, and their visual properties (often called aesthetics in the language of the Grammar of Graphics) are things like position (in both the x and y direction), size and colour.

In this example, the variable is mapped to colour, the variable is mapped to the x position, and the variable is mapped to the y position. Source: Maarten Lambrechts, CC BY SA 4.0

In this example, the country variable is mapped to colour, the year variable is mapped to the x position, and the cases variable is mapped to the y position. Source: Maarten Lambrechts, CC BY SA 4.0

The rules to encode data into the visual properties of the geometrical objects are called scales. For example, the scales for x and y position geometries in the x and y direction, and a colour scale gives geometries certain colours.

Guides are chart elements that help viewers read values from a visualisation. In the case of position scales, the guides are the axes, in all other cases, guides are presented in the form of a legend (for example colour and size legends).

Data

As with any other visualisation tool, making a visualisation with the Grammar of Graphics start with the data. The input data for a visualisation is almost always tabular data, with rows representing the records and columns representing the dimensions (also sometimes called measures, or fields) for each record. In the language of tidy data (see the Tidy data module), rows are called observations, and columns are variables.

Example of a data table, with each row representing a type of car and each column representing a variable measured on each car. Source: Maarten Lambrechts, CC BY SY 4.0

Most tools have functionality to load data from different sources and with different file formats, and to convert data into the format required for producing visualisations.

Variables can be of different types: they can be integers, continuous numerical values, categorical values and date/time stamps. The type of variable determines how it can be mapped to the aesthetics of geometrical objects. For example, it is not meaningful to use a categorical variable to the height of a bar, or using a continuous numerical variable to encode the shape of symbols.

Because of this, you need to make sure the tool you are using recognises the type of each variable in the data and parses its values correctly. If not, errors will be generated, or the visualisation process will lead to unexpected results.

Geometric objects

Geometric objects (sometimes also called “geoms”, or “marks”) are the elements in a chart that carry the encoded data as visual properties. The geometries determine the resulting type of plot. For example with points as geometric objects, you can create a scatter plot, while with a line geometry you produce line charts.

Geometric objects can be divided into 3 main categories, based on their dimensionality:

point geometries have zero dimensions, and are tied to a single location in a plot. Text geometries, used for placing text on a plot, are also considered point geometries, because they are also tied to a single location
path and line geometries are 1-dimensional. They can be used to connect points belonging to the same group, and they can create straight lines as well as curves
rectangles, other polygons and other shapes are 2-dimensional geoms

The geometric objects available in Observable Plot. Source: observablehq.com/@observablehq/plot