How do squid servers affect Urchin?
Squid proxy servers are usually used as reverse proxies in a website environment, where 1 Squid pulls data from multiple web servers to load balance the site across the servers. This usually means that the Squid server has its set of logs, and each web server has their own set of logs.
Basic environment with a squid server:
Browser — | — Internet — | — Squid —- | ——- Web server 1 | ——- Web server 2 | ——- Web server 3 | ——- Web server 4
The Squid server logs the visitorÂ’s ip address (since this is the machine that the visitor is accessing), and the web servers log the squid serverÂ’s ip address (since the squid server is the machine actually fetching the content from the web servers)
Squid Log File Format:
The native access.log has ten (10) fields. There is one entry here for each HTTP (client) request and each ICP Query. HTTP requests are logged when the client socket is closed. A single dash (-) indicates unavailable data.
- Timestamp : The time when the client socket is closed. The format is Â‘Unix timeÂ’ (seconds since Jan 1, 1970) with millisecond resolution. This can be modified to visible format by Â‘cat access.log | perl -nwe Â‘s/^(\d+)/localtime($1)/e; printÂ’;.
- Elapsed Time : The elapsed time of the request, in milliseconds. This is time between the accept() and close() of the client socket.
- Client Address : The IP address of the connecting client, or the FQDN if the Â‘log_fqdnÂ’ option is enabled in the config file.
- Log Tag / HTTP Code : The Log Tag describes how the request was treated locally (hit, miss, etc). All the tags are described below. The HTTP code is the reply code taken from the first line of the HTTP reply header. Non-HTTP requests may have zero reply codes.
- Size : The number of bytes written to the client.
- Request Method : The HTTP request method, or ICP_QUERY for ICP requests.
- URL : The requested URL.
- Ident : If ident_lookup is on, this field may contain the username associated with the client connection as derived from the ident service.
- Hierarchy Data / Hostname : A description of how and where the requested object was fetched.
- Content Type : The Content-type field from the HTTP reply
More information can be found http://www.squid-cache.org/Doc/FAQ/FAQ-6.html#ss6.6. General information about the Squid Cache can be found here http://squid-cache.org.
So how does the squid server cause problems for Urchin? If all your traffic is flowing through the Squid server, then you might want to have Urchin process the Squid log rather than the actual web server logs (the web server logs will probably only list the IP address of the Squid server). Unfortunately the Squid server access log only has 10 pieces of information in it. ItÂ’s not very robust.
We have heard of a client who configured the squid server to write data directly to the access logs of the web server. If the squid server is in fact logging data to the web servers while the servers are also logging the access from the caching server, itÂ’s possible that they might be getting double the results, since each hit is being logged twice. You may want to review the logs from the web server and squid server to make sure this is not the case, and if so, create filters for the content, accordingly.