The errorbot now converts timestamps

I've pretty much completely rewritten my errorbot. It's a Bash script that we use at the hosting company I work for. It looks for recent entries in any error log file under a user's account. So, if the user example is reporting an error on their website then a quick errorbot example should find the issue.

Out with the old: grepping recent entries

Error logs can be huge, and you're normally only interested in errors that occurred in the last few minutes. Grepping recent log entries isn't straight forward though. Initially, my solution was to store the current date, current hour and the current hour minus one hour in variables. You can then grep entries that occurred since the start of the previous hour:

from_date=$(date +"%d-%b-%Y")
from_hour_1=$(date +"%H")
from_hour_2=$((10#$from_hour_1 - 1))

grep -E "$from_date ($from_hour_2|$from_hour_1)" "$f"
        

An issue with this approach is that timestamps in log files may use a different timezone than the one the server itself uses. For instance, many logs use UTC timestamps. We're currently in British Summer Time, and so we're an hour ahead. The script would still retrieve the most recent entries, as it looks for entries since the start of the previous hour. Still, it's not a proper way of finding recent log entries.

In with the new: converting timestamps with awk

The only solution I could think of is converting the timestamps to Unix timestamps. That also opened the door for the new --since option. Once you got Unix timestamps you can simply deduct a specified number of seconds from the current date. That's very useful, as you can then view a page that's triggering an error and run errorbot --since=1 example to get only log entries for the last minute (60 seconds).

Converting timestamps is much more complicated then I had imagined. I settled on using awk for the job and, with the help of insanely smart people on Stack Exchange, found a one-liner that works. In the process I ran into all sorts of issues. The main one was that log entries may contain multiple lines (for instance when a stack trace is included). I also found several examples of malformatted log entries. The script deals with most of those issues but it's still a work in progress; I'm pretty sure I'll run in various edge cases.

Out with the global error as well (for now)

The same logic didn't work for the global Apache error log. Previous versions of the script also grepped the global error log for entries related to the user. The problem I encounted here is that the timestamps are formatted differently on different servers. On most (but not all) Litespeed server the format is sensible and easy to convert: 2020-06-20 12:00:00.000000. However, the default Apache timestamp is completely weird: Sat Jun 20 12:00:00 2020.

The solution is to check which format is used before doing the conversion. I haven't got round to that yet. At the same time I'm hoping to solve another issue with grepping the global error log for a user: grepping for a user name in the global log can retrieve junk data. Just imagine grepping for the username 2020 or error.

For now, I've stripped the option that greps the global error log. I need to properly learn awk and am working my way through the GNU Awk User's Guide. The next errorbot branch is likely to be as experimental as the last one…