Easy ways to visualise data when using Polars

Robin van den Brink
4 min readMar 11, 2023

--

Polar bear painting on a computer screen

SEPTEMBER 2023 UPDATE: A lot has changed since I wrote the original article on data visualisation when using Polars six months ago. So I have updated all the code snippets, visualisations and notebook.

One issue I frequently read about is that Polars does not have the same level of integration as Pandas, which can make it less helpful. This is partly correct. Frequently talked about is the capacity to visualise data. When you have been the industry norm, like Pandas, for some time, there are some major advantages. The biggest benefit is likely having other projects that are built on top of your solution. Frequently talked about is the capacity to visualise data. Should that prevent you from using Polars as a data analyst or scientist? No, because there are numerous methods to display your data.

A repository with a notebook including all the examples can be found here: https://github.com/r-brink/polars-visualisation-tutorial

Data visualisation packages

There are many solutions. My initial focus is on three of them. I probably miss some solutions, but at the end you will probably see how you can use those as well with Polars. Our target packages:

  • Seaborn — ✅ works with Polars Dataframes
  • Matplotlib — ✅ works with Polars Dataframes
  • Altair — ✅ works with Polars Dataframes

Generating our dataset and setting up our environment

I’ll pick the TPCH dataset. The file lineitem.parquet has 60 million rows and is 2GB. In one of my past articles, I explained how you can create the file yourself. A smaller version of this dataset exists on Kaggle as well, though for unknown reasons, it may not be identical to the edition I’m using here.

This example is mainly about comparing tools, so I’ll use a straightforward scenario to show how they work. TPCH is not a very interesting dataset to visualize because there are almost no unusual points, as you can see.

Below we create two dataframes to visualise.

Seaborn

Seaborn is a commonly used Python tool for visualising data. It is built on Matplotlib and provides a user-friendly platform to create attractive plots with minimal effort. Moreover, Seaborn and Polars work seamlessly together. Unlike Pandas Dataframes, Seaborn is not restricted to their use, meaning that you can utilise your Polars Dataframes. Seaborn is a commonly used Python tool for visualising data. Let us examine this.

The visualisation process is simple. In the function sns.lineplot(), you give the x and y values from the Polars Dataframe.

Here, we create a simple histogram from our data set:

Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. According to Matplotlib.

Matplotlib, like Seaborn, can operate with Polars Dataframes out of the box.

Altair

In a recent Altair update (v5+), it’s now possible to work with Polars Dataframes out of the box as well.

Below a code examples to plot both the maximum and median price in a single visualisation.

Other visualisation packages

In recent article by Marc Garci (2023–02–17), a Pandas core developer, the following way of working was suggested.

From Marc Garci’s article on Pandas 2.0 and the Arrow revolution

Marc suggests Polars to do the heavy lifting and use Pandas as some sort of compatibility layer. That works perfectly with Pandas 1.x and will become even faster now with Pandas 2.0 available. This means that any visualisation package that you use with Pandas can also be used for Polars. We will look at an example in the next session.

Creating a Pandas Dataframe from a Polars Dataframe is fast. From there we can build upon the years of industry leadership of Pandas and the lively ecosystem. Another option is to use Pandas plotting function.

More complex example from the TPCH benchmark

This example uses query 4 of the TPCH benchmark.

Note: if you are coding along pay attention to the dataset we use in the examples below. We use the original parquet file, instead of the smaller Dataframe what we created.

Utilising Pandas

Plotting is as simple as chaining .to_pandas().plot() behind collect(), as we can see below. This closely resembles the example of Marc Garcia.

Matplotlib

Altair

Conclusion

This short post shows that lack of dedicated data visualisation libraries should not limit you in using Polars. The commonly used libraries, Seaborn, Matplotlib and Altair, can all be used with Polars Dataframes without any additional setup. All the other libraries are available by simply writing the results to a Pandas Dataframe with one simple function.

Polars and Pandas can become a very powerful combination of tools. Pandas can help your IO efforts for less popular formats and Polars can do all the heavy lifting. As is also suggested by one of the core developers of Pandas recently. You can get the best of both worlds.

--

--

Robin van den Brink
Robin van den Brink

Written by Robin van den Brink

Building digital products as product manager and hobby developer. Focused on data and AI products. Polars enthusiast and contributor.

Responses (1)