HOW TO ROTATE MY LOGS WITHOUT LOSING DATA

FAQ-COM120 : HOW TO ROTATE MY LOGS WITHOUT LOSING DATA
PROBLEM:
I want to archive/rotate my logs using my server system (for example logrotate) options or a third software (rotatelog, cronolog) but I don't want to lose any visits information during the rotate process.
SOLUTION:
If your config file is setup with a LogFile parameter that point to your current running log file (required if you want to use the AllowToUpdateStatsFromBrowser option to have "real-time" statistics), to avoid losing too much records during the rotate process, you must run the AWStats update JUST BEFORE the rotate process is done.
The best way to do that on 'Linux like' OS is to use the linux built-in logrotate feature. You must edit the logrotate config file used for your web server log file (usually stored in /etc/logrotate.d directory) by adding the AWStats update process as a preprocessor command, like this example (bold lines are lines to add for having a prerotate process):
/usr/local/apache/logs/*log
{
notifempty
daily
rotate 7
compress
sharedscripts
prerotate
/usr/local/awstats/wwwroot/cgi-bin/awstats.pl -update -config=mydomainconfig
endscript
postrotate
/usr/bin/killall -HUP httpd
endscript
}

If using a such solution, this is sequential steps that happens:
Step Description Step name Date/Time example
A logrotate is started (by cron) Start of logrotate 04:02:00
B
awstats -update is launched by logrotate
Start of awstats 04:02:01
C
awstats start to read the log file file.log
  04:02:02
D
awstats has reached the end of log file so now it starts to save its database on disk.
  04:05:00
E
awstats has finished to save its new database, so it stops
End of awstats 04:06:00
F logrotate moves old log file file.log to a new name file.log.sav. Apache now logs in this file file.log.sav since log file handle has not been changed (only log file name has been renamed). Log move 04:06:01
G logrotate sends the -HUP or -USR1 signal to Apache.
With -HUP, Apache immediatly kills all its child process/thread, close log file file.log.sav, and reopen file file.log. So now, ALL hits are written to new file.
With -USR1, Apache only ask its child process/thread to stop only when HTTP request will be completely served. However it closes immediatly log file file.log.sav, and reopen file file.log. So only NEW hits are written to new log file. HTTP requests that are still running will write in old one. Apache restart 04:06:02
H logrotate starts compress the old log file file.log.sav into file.log.gz Start compress 04:06:03
I
If some apache threads/processes are still running (because the kill sent was -USR1, so child process are waiting end of request before to stop), then those threads/processes are still writing to file.log.sav.
If kill -HUP was used, all process are already restarted so all writes in new file.log.
 
J logrotate has finished to compress log file into file.log.gz. File file.log.sav is deleted. End of compress
End of logrotate 04:07:03
K
If signal was -USR1, some old childs can still run (when serving a very long request for example). So the log writing, still done in same file handle are going to a file that has been removed. So log writing are lost nowhere (this is only if -USR1 was used and if request was very long).
 

The advantage of this solution is that it is a very common way of working, used by a lot of products, and easy to setup. You will notice that you can "lose" some hits:
If you use the -HUP signal, you will only lose all hits that were written during D and E. Note that you will also break all requests still running at G. In the example, it's a 1 minute lost (for small or medium web sites, it will be less than few seconds), so this give you an error lower than 0.07% (less for small web sites). This is not significant, above all for a "statistics" progam.
If you use the -USR1 signal, you will not kill any request. But you will lose all hits that were wrote during D and E (like with -HUP) but also all hits that are still running after H (all very long request that requires several minutes to be served). If hit ends during I, it is wrote in a log file already analyzed, if hit ends at K, it is wrote nowhere. In the example, it's also a 0.07% error plus error for other not visible hits that were finished during I or K, but number of such hits should be very low since only hits that started before G and not finished after H are concerned. In most cases a hits needs only few milliseconds to be served so lost hits could be ignored.

Note also that if you have x logrotate config files, with each of them a postrotate with a kill -HUP, you send a kill x times to your server process. So try to include several log files in same logrotate config file. You can have several awstats update command in the same prerotate section and you will send the -HUP only once, after all updates are finished. However, doing this, you will have a lap time between D and F (were some hits are lost) that will be higher.

Another common way of working is to choose to run the AWStats update process only once the log file has been archived.
This is required for example if you use the cronolog or rotatelog tools to rotate your log files. For example, Apache users can setup their Apache httpd config file to write log file through a pipe to cronolog or rotatelog using Apache CustomLog directive:
CustomLog "|/usr/sbin/cronolog [cronolog_options] /var/logs/access.%Y%m%d.log" combined
If you use a such feature, you can't trigger AWStats update process to be ran just BEFORE the rotate is done, so you must run it AFTER the rotate process, so on the archived log file.
To setup awstats to always point to last archive log file, you can use the 'tags' available for LogFile.
The problem with that is that your data are refreshed only after a rotate has done. However, you will miss absolutely nothing (no hits) and your server processes are never killed.

So, if you really want to not lose absolutely no hit and want to have updates more frequently than the rotate frequency, the best way is still an hybrid solution (i am not sure that it worth the pain, and remember that statistics are only statistics):
You run the awstats update process from you crontab frequently, every hour for example, and half and hour before the rotate has done. See next FAQ to know how to setup a scheduled job.
Then, once the rotate has been done (by the logrotate or by a piped cronolog log file), and before the next scheduled awstats update process start, you run another update process on the archived log file using the -logfile option to force update on the archived log file and not the current log file defined in awstats config file. This will allow you to update the half hour missing, until the log rotate (AWStats will find the new lines). However don't forget that this particular update MUST be finished before the next croned update.

No comments: