gnuplot - My New Favorite Tool
I recently had the need to quickly visualize some data and none of the systems I usually work with had the data. Initially I dumped the data in Google Sheets and created a chart there, but that was slow and didn’t really scale well. The data had to be cleaned, brought into the right format, columns had to be selected and charts created. At this point I faintly recalled reading about gnuplot which, despite its name, has no affiliation with the GNU project.
After some hacking I got some graphs working, but it took me a while to get what I wanted. The syntax was not initially intuitive, partially because I decided to go straight to the tool without reading any docs first. I liked what I had at my hands, but I wanted to learn more so I picked up Gnuplot in Action, Second Edition and started reading. The book is fantastic so far and I’m condensing some notes here about this magnificent tool, so I can quickly go back and find them.
The first thing I got realized is that gnuplot is actually meant to be used interactively. I’ve only used it by calling it from the command line and feeding data straight into it, which is good if you’re scripting something, but it made getting the graphs a really cumbersome trial and error. Using it interactively is pretty great as every plot you issue is automatically rendered on the current terminal.
To get you started, install gnuplot (you know how to do that) and start it. You’ll get an interactive terminal:
gnuplot> plot sin(x)
Depending on you version it might help to specify a range:
gnuplot> plot [-6:6] sin(x)
You’ll get a little window with your plotted function:
Plotting Data From Files
Plotting functions is fun and the nerd in me enjoys it, but there’s only so much use in plotting random polynomials. The real power of gnuplot is by reading data from files. Reading input from files is trivially easy:
gnuplot> plot "file-white-space-separated-content" using 1:2
This will read the file
file-white-space-separated-content and plot the first column on the x-axis and the second
column on the y-axis. Note that the filename has to be in quotes.
Anatomy of the Plot Command
This is my understanding of the plot command using semi-regular expressions:
plot [[X-RANGE]:[Y-RANGE]] "<DATAFILE>" using <X-DATA>:<Y-DATA> title <TITLE> [with <DRAW-METHOD>] [, using <X-DATA>:<Y-DATA> [with <DRAW-METHOD>]]*
Let me break down the pieces:
- X-RANGE and Y-RANGE
- Optional, specifies the x and y range to be plotted. If not specified gnuplot will pick based on the data. You can use
this to filter by specifying only one part of the range, for example
plot [1000:]will only plot y values larger than
- The source from which gnuplot reads data. Can be a filename or a
-(dash) for stdin or an empty string
""to use the previously used file, very useful when plotting more than one series.
- Label for this series. gnuplot will always label a series, but it might not pick the best title.
- Specify the columns that we want gnuplot to render, with
1being the first column and
0and implicit, increasing column. If an argument is enclosed in parentheses it’s evaluated as an expression with
$1being the first column, and so on.
- Tells gnuplot how you would like it to visualize your series. Among the available options are
dots, and many more. See
help plot withfor a full list of available styles.
Useful gnuplot Snippets
Here’s a bunch of useful snippets I regularly use to quickly crunch some numbers and visualize them:
Reading CSV Files
gnuplot> set datafile separator ","
gnuplot> set datafile separator comma
Reading Time Series Data
The data I was dealing with as pulled from a running system and had memory consumption stats and a Unix timestamp. Reading time series data with gnuplot is straightforward: you set the data format for the x-axis and the time format.
gnuplot> set xdata time gnuplot> set timefmt "%s" # %s is for Unix timestamps, see all formats with help set timefmt gnuplot> plot "job1.csv" using 1:4 with lines
Human Readable Sizes
I deal with byte counts and I find the scientific notation not as intuitive. I prefer the traditionally used prefixes (from this helpful StackOverflow answer):
gnuplot> set format y "%.0s%cB"
gnuplot> set terminal pngcairo size <WIDTH>,<HEIGHT> # or just png gnuplot> set output "filename.png" gnuplot> plot ...
pngcairo terminal supports more options, see
help set terminal pngcairo for more details.
Here’s a graph that I generated from some JVM memory statistics from an app that I work with. I know, one could use visualvm, but in a distributed system I find it easier to just scrape the stats from JSON and plot them because it is so fast and gives me a really quick overview of what’s happening:
Here are the necessary gnuplot commands:
set xdata times set timeformat "%s" set terminal pngcairo size 967,500 set output "mem-graph.png" set format y "%.0s%cB" plot "job1.csv" using 1:4 title "used" with lines, "" using 1:5 title "committed" with lines, "" using 1:3 title "max" with lines