Problem :
I have a log file (auth.log) where non-relevant lines has been removed.
I wish to aggregate lines per hour/day into the plot, meaning that each line that is within the same hour or day is aggregated into one tic in the plot.
I have been looking into functions, but I keep getting stuck.
This is what I have so far, but it will only work if I have a “variable” for each line in the log file.
#!/usr/bin/env gnuplot
set terminal png size 1200,800
set output "graph.png"
set title "Breakin Attempts"
set key top right box
set style data lines
set border 3
set grid
set pointsize 3
set xlabel "Number of breakin attempts"
set xtics nomirror
set xdata time
set timefmt "%b %d %H:%M:%S"
set format x "%m/%d"
set ylabel "Time"
set ytics nomirror
plot "pc1.log" using 1:4 title "PC1" linecolor rgb "red",
"pc2.log" using 1:4 title "PC2" linecolor rgb "blue",
"pc3.log" using 1:4 title "PC3" linecolor rgb "green"
Here is an example of the data
Sep 18 11:26:30 root 60.191.36.196
Sep 18 11:26:34 root 60.191.36.196
Sep 18 11:26:37 root 60.191.36.196
Sep 18 19:21:31 root 198.56.193.74
Sep 18 19:21:33 root 198.56.193.74
In this case the two entries at 19:21:xx will be one tic of 2 and the three at 11:26:xx will be a tic of 3.
Solution :
I assume you want the count of entries per time unit (minutes in your example). I do not know, whether gnuplot can count lines in this manner. I would use awk
(or any language convenient for you) to cumulate the data instead. Something like this would do:
script = ‘{time = $3; gsub(/:[0-9][0-9]$/, “”, time); date=sprintf(“%s %s %s”, $1, $2, time)} date==last{count++} date!=last{print date, count; count=0}’
pipe(file) = sprintf(“< awk ‘%s’ %s”, script, file)
plot pipe(“pc1.log”) title “PC1”
Your question is not very explicit. As Hannes, I assume you want to plot the number of lines corresponding to a certain date.
Gnuplot is not well suited for this, pre-processing the file is recommended.
However, with gnuplot 3.4 or later you can program counters (as global variables), so you could have something like this:
currentx=1/0
currentn=0
increaseandreturn(returnvalue)=(currentn=currentn+1,returnvalue)
startnewxandreturn(x,returnvalue)=(currentx=x,currentn=0,returnvalue)
count(x)=((x==currentx)?increaseandreturn(1/0):startnewxandreturn(x,currentn))
plot "file.gdat" using ($1-1):(count($1)) with points
It works only for sorted files (it will add consecutive entries, not nonconsecutive ones), currentx
has to contain the first value (or you need to insert more tests). For dates you will need to adapt the sript a little.
You can test it e.g with a file generated by gnuplot like this:
set table "file.gdat"
set parametric
plot [0:20] floor(exp(t/10)),t
unset table