Webalizer Site Analysis and Log Tool
The Webalizer is a fast, free web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for viewing with a standard web browser. We install this application on every webhosting and shared server that is leased. We have come to conclude that most users find this application more than sufficient to meet their web site analysis needs. A page link to CybrHost's site statistics page can be found at the bottom of this page. Examples, graphs, tables, definitions and detailed implementation usage are also avaialble at the bottom of this page.
Why the Webalizer?
The Webalizer was written to solve several problems with currently available log analysis programs. There are several very well written analysis programs out there that do very good jobs of producing usage statistics. Many are not free. Many could not deal with the Combined Log Format logs that Apache web servers produced. Some just plain produced wrong results or did not produce the statistics in a useful format.
Type of Report
The Webalizer produces yearly, monthly, daily and hourly statistics. In the monthly reports, various statistics may be produced to show overall usage, usage by day and hour, usage by visiting sites, URLs, user agents (browsers), referrers and country.
Speed of the Webalizer
The Webalizer was written to be as fast as possible while still providing proper error checking and results integrity.
On a 400Mhz PPro machine running Linux 2.0.34, The Webalizer can process approximately 15,000 ECLF log records a second. A site receiving 150,000 hits a month can be processed in just around 5 seconds. Of course these figures are completely unscientific, and the performance of the program relies on several factors such as available processor time and number of concurrent processes being run, configuration options, etc. Your mileage may vary!
Features
- Is written in C to be extremely fast and highly portable. On a 400Mhz pentium machine, over 10,000 records can be processed in one second, with a 40 Megabyte file taking roughly 7 seconds (over 150,000 records).
- Supports standard Common Logfile Format server logs. In addition, several variations of the Combined Logfile Format are supported, allowing statistics to be generated for referring sites and browser types as well.
- Generated reports can be configured from the command line, or by use of one or more configuration files. Detailed information on configuration options can be found in the README file, supplied with all distributions.
- Supports multiple languages. Currently, English, Spanish, French, German, Italian, Dutch, Russian, Polish, Slovak, Swedish, Catalan, Czech, Korean, Chinese, Portuguese (including Brazilian dialect), Danish, Hungarian, Estonian, Greek and Romanian language files are available.
Unlimited log file sizes and partial logs are supported, allowing logs to be rotated as often as needed, and eliminating the need to keep huge monthly files on the system.
How to Use The Webalizer
The Webalizer example graphs and tables, definitions and usage can be viewed by looking at Webalizer Definitions and Usage.
The Webalizer's Definitions:
- Hits
Any request made to the server which is logged, is considered a 'hit'. The requests can be for anything... html pages, graphic images, audio files, CGI scripts, etc... Each valid line in the server log is counted as a hit. This number represents the total number of requests that were made to the server during the specified report period. - Files
Some requests made to the server, require that the server then send something back to the requesting client, such as a html page or graphic image. When this happens, it is considered a 'file' and the files total is incremented. The relationship between 'hits' and 'files' can be thought of as 'incoming requests' and 'outgoing responses'. - Pages
Pages are, well, pages! Generally, any HTML document, or anything that generates an HTML document, would be considered a page. This does not include the other stuff that goes into a document, such as graphic images, audio clips, etc... This number represents the number of 'pages' requested only, and does not include the other 'stuff' that is in the page. What actually constitutes a 'page' can vary from server to server. The default action is to treat anything with the extension '.htm', '.html' or '.cgi' as a page. A lot of sites will probably define other extensions, such as '.phtml', '.php3' and '.pl' as pages as well. Some people consider this number as the number of 'pure' hits... I'm not sure if I totally agree with that viewpoint. Some other programs (and people :) refer to this as 'Pageviews'. - Sites
Each request made to the server comes from a unique 'site', which can be referenced by a name or ultimately, an IP address. The 'sites' number shows how many unique IP addresses made requests to the server during the reporting time period. This DOES NOT mean the number of unique individual users (real people) that visited, which is impossible to determine using just logs and the HTTP protocol (however, this number might be about as close as you will get). - Visits
Whenever a request is made to the server from a given IP address (site), the amount of time since a previous request by the address is calculated (if any). If the time difference is greater than a pre-configured 'visit timeout' value (or has never made a request before), it is considered a 'new visit', and this total is incremented (both for the site, and the IP address). The default timeout value is 30 minutes (can be changed), so if a user visits your site at 1:00 in the afternoon, and then returns at 3:00, two visits would be registered. Note: in the 'Top Sites' table, the visits total should be discounted on 'Grouped' records, and thought of as the "Minimum number of visits" that came from that grouping instead. Note: Visits only occur on PageType requests, that is, for any request whose URL is one of the 'page' types defined with the PageType option. Due to the limitation of the HTTP protocol, log rotations and other factors, this number should not be taken as absolutely accurate, rather, it should be considered a pretty close "guess". - KBytes
The KBytes (kilobytes) value shows the amount of data, in KB, that was sent out by the server during the specified reporting period. This value is generated directly from the log file, so it is up to the web server to produce accurate numbers in the logs (some web servers do stupid things when it comes to reporting the number of bytes). In general, this should be a fairly accurate representation of the amount of outgoing traffic the server had, regardless of the web servers reporting quirks.
The Webalizer's Usage:
Webalizer is a system process that runs nightly at around 12 AM Pacific time.
To view the results simply go to http://your_url/logs/index.html.