1. It is possible to use both Urchin Software and Google Analytics simultaneously. Why would you want to use both products at the same time?
  2. Can I track banner ad clicks on my website?
  3. Where do you download Urchin software?
  4. Where can I find the default log file formats for Urchin?
  5. Will Urchin run on a 64-bit OS?
  6. What are the recommended hardware specs for Urchin 5?
  7. How can I move Urchin and all of the report data to another server?
  8. How do I delete and re-process data in Urchin?
  9. I see hits in my Urchin reports, but no sessions or visits?
  10. How do I back up my Urchin configuration on a nightly basis?
  11. It appears that my profile is stuck with a status of 'pending'. How do I fix this?
  12. What are all the various types of user-agents?
  13. Where can I find a list of mobile user-agents?
  14. What is a common Apache log format that will work with the utm tracking method?
  15. How Urchin handles the UTM domain name?
  16. What is the log file format for a Helix (Media Server [Real Media])?
  17. Can I changing the value of the DD and %d wildcards?
  18. Are there any issues accessing a log file on a UNC Connection (mapped network drive)?
  19. Why do I see a '-' in some of my reports and how can I remove it?
  20. I just installed the UTM tracking code, why does Urchin report previous visitors?
  21. How do I fix the 'Profile has been locked by another process. Exiting.' error?
  22. Are page hits with a 304 "not modified" header counted as hits?
  23. I'm getting the error 'Warning, Task Scheduler Disabled' when trying to process my log files.
  24. How does Urchin use the time zone offset found in Apache logs?
  25. Can I change the language for the admin section of Urchin?
  26. How do I rename an existing profile in Urchin software?
  27. Can I schedule Urchin to process logs without using the Urchin Scheduler?
  28. What format does Urchin expect Google's AdWords PPC data to be in?
  29. Can I share log sources between different affiliations?
  30. What are the minimum fields required to process an IIS log file?
  31. I'm having trouble running the geo-update command in Urchin 5?
  32. Can Urchin process a log file with two different line formats?
  33. What are some of the restrictions on the Urchin demo?
  34. Why are certain files missing from the Urchin reports?
  35. How many log sources come with each Urchin module?
  36. How do I register Urchin if the machine can not reach the internet?
  37. Is there a way to export more data than what appears on the screen?
  38. Information about the Urchin DNS database update.
  39. What is the maximum amount of memory that Urchin can use?
  40. How do I enable logging for Urchin's internal web server?
  41. How do I turn data center mode on and off?
  42. What does "ERROR: Received interrupt signal(2)" mean?
  43. How does ct_runstatus affect the Urchin scheduler?
  44. How does Urchin calculate average session length?
  45. What type of database does Urchin use?
  46. Will the 500,000 limit on database table rows limit the amount of data I can store in Urchin?
  47. What does the following error mean: ERROR: (7008-54-441) DB file is the wrong size - run sanitizer.
  48. Is Urchin software multi-threading?
  49. How do squid servers affect Urchin?
  50. Can Urchin 5.X process binary log files?
  51. Do I need a load balancing module?
  52. Does urchin identify user session before or after applying filters?
  53. What format does Urchin expect Overture's PPC data to be in?
  54. What are the data fields used in Urchin 5?
  55. What does the error "WARNING: (7026-76-83) Could not open log file - check permissions" mean?
  56. Can I globally change the case setting of the URI stem in my log file?
  57. What information does the 'machine' of a configuration file hold?
  58. Where is the Urchin configuration database located?
  59. Urchin processes my log file as one long line. Why?
  60. How does the UTM Tracking Method for Urchin 5 work?
  61. How can I change the textual content of Urchin?
  62. Is there a version of Urchin 5 for Solaris 10?
  63. Why do I see a drop in my visitors when changing Visitor Tracking Method from 'IP + UserAgent' to 'UTM'?
  64. My web application sets the session ID in either a cookie or a query string variable. How do I configure Urchin?
  65. What are the data fields used in Urchin 5?
  66. How do I set up Urchin to work with Plesk?
  67. Are there any helper scripts for Urchin 5?
  68. How does Urchin 5 licensing work?



Question: It is possible to use both Urchin Software and Google Analytics simultaneously. Why would you want to use both products at the same time?

Answer: Yes and two reasons: You can use Urchin Software to validate your Google Analytics results. You can use some of the functionality in Urchin Software that is not available in Google Analytics (like custom reports).

The one caveat to using both the software and service at the same time is that you must use the UTM tracking method.

The modification necessary to track using both Urchin Software and Google Analytics is trivial. Modify the UTM as follows:

Find the line var _userv=1

Change the above line to read var _userv=2

Now the UTM will log hits to your local server and the remote server on Google’s network.

Question: Can I track banner ad clicks on my website?

Answer: Both Google Analytics and Urchin 5 software provide a method for you to track banner ad exits on your website. This information is valuable because it gives you some idea of where people are going after they visit your site. It also gives you accurate data for the value of your banner ad space on your website.

Google logs clicks on the banner ads as though the user clicked on a page in your website. So when you view reports in Google Analytics you will see the banner ad click as a click to a special page. The name of that page is created by you.

All the setup for this tracking method is done in the HTML of your website. When you create the link for the banner ad use the following syntax:

<a href="http://www.banner-ad-target.com" onClick="javascript:urchinTracker('/bannerads/advertiser-name/banner-ad-name')"> banner-ad</a>

When the user clicks on the banner ad the JavaScript function urchinTracker() will log a pageview for the page named "/bannerads/advertiser-name/banner-ad-name. This isn't a real page, but something that Urchin can display in its reports. One thing to notice is the logical structure of the dummy page. Using a structure like the one above allows you to easily identify the referrals to each advertiser on your site.

Urchin Knowledge Base Q and A: Part 1

Question: Where do you download Urchin software?

Answer: All builds of Urchin software can be found here: http://www.google.com/analytics/urchin_downloads.html

Question: Where can I find the default log file formats for Urchin?

Answer: Urchin has a collection of log file templates that it uses to parse your log files. All the log file templates are stored in the /path-to-urchin/lib/reporting/logformats directory.

Question: Will Urchin run on a 64-bit OS?

Answer: According to Urchin support, no version of Urchin runs on a 64-bit OS. Internally, Urchin engineering, has tested Urchin on a 64-bit architecture with mixed results. Although adding the 32-bit libraries should, in theory, work, there's no assurance that it can continue to process profiles under working conditions.

Question: What are the recommended hardware specs for Urchin 5?

Answer: Urchin's superior performance allows you to get more from less hardware investment. For instance, an older Pentium II might be too slow for desktop use, but will make a fine Urchin server. And Urchin's unmatched portability means you can use whichever operating system you like. Below, we provide a recommended level of hardware for high performance.

Single Small to Medium Website Analysis
500mhz or better processor
128mb RAM
10GB+ IDE hard disk
Ethernet interface
Service Provider / Enterprise Installations
1Ghz Pentium IV / 500mhz UltraSPARC / similar mhz range PPC/MIPS/etc.
256mb RAM
Ultra2/Wide SCSI hard disk (such as a Seagate Cheetah)
100base-T ethernet
Backup system
Memory/System/Disk Usage

Urchin Memory(RAM) usage can be configured to use between 20-500Mb

Urchin can be configured to run at low, normal or high priority

Urchin's data storage will use approximately 10% of the size of raw logs

Question: How can I move Urchin and all of the report data to another server?

Answer: Moving Urchin 5.x from one server to another requires the following basic steps:

  1. Upgrading to the latest version
  2. Moving the Urchin reporting databases for the profiles
  3. Moving the Urchin configuration data
  4. Moving any custom files and templates
  5. Checking new file permissions
  6. Re-activation of the Urchin license

NOTE The following detailed instructions can be applied to cross-platform moves as well. The Urchin reporting database files can be moved to any operating system supported by Urchin. Considerations when moving data and files from one platform type to another include command line syntax (such as adding './' or 'sudo ./' for Unix and Mac OSX systems respectively). Other considerations include directory syntax changes such as 'Standard\Windows\folders\' vs. '/standard/unix/directories/.'

Preliminary setup

If you are upgrading to a new major version of Urchin (for example, 4.x to 5.x), or if you are not currently using the most up-to-date release of the product, it is strongly recommended that you upgrade Urchin on the old server before moving any data or configuration files.

Moving the Reporting Databases

By default, All Urchin reporting data is stored in a child directory of the main Urchin directory; ~/urchin/data This directory should be archived to maintain the subdirectory structure, then moved to the new server after installing Urchin there. It should be restored in the same directory location on the new server when possible; ~/urchin/data.

Moving the Configuration

The configuration also must be moved. However, if the new directory structure is different from that of the older server, Urchin will not be able to process logs and write data correctly. For instance, if Profile-A on the old server has the following parameters:

Log Source: /home/www/site1/logs/access_log Data directory: /usr/home/urchin/data

and the new server has a slightly different directory structure:

Log Source: /usr/www/site#/logs/access_log Data Directory: /usr/local/urchin/data

then, the old Urchin configuration cannot be imported "as is" and must be either recreated or edited before importing.

Exporting and Importing the Configuration

The Urchin 5 configuration can be edited as a text file. To do this, you must first export it from the ~/urchin/util directory by running this command:

uconf-export -f config.txt

This will create a file called config.txt that can be edited and imported later.

In most cases, you should install and complete the basic setup of Urchin 5 on the new server first. Do not create any profiles or log sources though. Only complete the basic Administration and system setup. When that is finished and you are prompted to add new profiles, "stop." This is when you will want to import your configuration file from the old server.

Because you completed the basic setup, the new Urchin configuration will already contain "Global Settings" and "Process Settings." In most cases, these should not be changed. So, before you import your old configuration file, open it for editing. Remove the "Global Settings" and "Process Settings" configuration records from the old configuration file.

Those records will have more data, but look something like this:

#--------------------- # Global Settings #--------------------- ct_name=Access Settings cr_remoteadmin=on cr_remoteaccess=on ct_port=9999 #--------------------- # Process Settings #--------------------- ct_name=Process Settings ct_priority=normal cs_limitdbtable=10000

NOTE If you have added any special global changes to the old configuration that could only be done through text based configuration editing, make the necessary changes to the new configuration at this time.

Now you are ready to import the remaining configuration settings from the old server. Move the old configuration file into the new ~/urchin/util directory and run the uconf-import tool to restore the old server's Urchin configuration:

uconf-import -o -f config.txt

Copying Site-specific Customizations

Customizations made to Urchin, including custom log formats and report sets, are not kept in Urchin's "data" directory. Rather, these customizations are kept in a separate location under the "lib" directory of the Urchin distribution. When Urchin is installed on your new server, any customizations that have been made to the old Urchin system will need to be copied to your new Urchin installation. These files are usually kept in ~/urchin/lib/custom/.

To do this, go to the old Urchin server and navigate to the "lib" directory of the Urchin distribution. Next, create an archive of the entire "custom" directory (and all subdirectories) using an archive utility like tar or ZIP. Now copy this archive over to your new Urchin server and unpack it in the "lib" directory of your Urchin distribution.

Checking file permissions

After all files have been moved to the new server, you should perform a thorough check to ensure that all files have the required permissions and that all files are owned by the same user (UNIX systems only). To check for permissions, run the following command from the ~/urchin/util/ directory:

inspector -r

To check for file ownership (UNIX only) run the following command from the ~/urchin/ directory:

chown -R urchin_user:urchin_group ./

Reactivate the Urchin License

After you have completed the configuration import, you'll need to re-license Urchin on the new server. If you are upgrading from one major version of Urchin to another, contact your Urchin Account Manager to obtain a new license. If you are simply moving an existing license from the old server to the new server, contact Urchin's Technical Support department by submitting a trouble ticket at http://www.urchin.com/support/hosted_urchin.html

If you purchased the Urchin Base License from EpikOne, you do not need to contact Urchin Technical Support. For reactivation, contact EpikOne directly through info@epikone.com or 877.273.9921.

After Support resets the license, you can login and select "Activate Pre-purchased License." This will complete the migration and licensing process.

Question: How do I delete and re-process data in Urchin?

Answer: Reprocessing a Single Day:

In the Urchin admin GUI, edit the Profile and turn off Log Tracking under the Storage/DB tab. Be sure to click Update to save your change.

Under the Log Sources tab, ensure that the proper log file (s) to be re-processed are specified. The log data should only contain hits for the date(s) that you are zeroing out the statistics for.

Invoke a command shell on the Urchin system.

Run the udb-sanitizer utility in the 'util' directory/folder of the Urchin distribution with the command

udb-sanitizer -p profile-name -d YYYYMM

Where YYYYMM is the year and month containing the day you wish to reprocess.

Select option 5, Zero out one or more days. The utility will prompt you for the correct day and will zero out the statistics for that particular day. If you have a range of contiguous days you'd like to zero you can specify that range by using the numbers of the start and end days separated by a hyphen (e.g. 5-10 to zero out days 5 through 10 of the month). If necessary, re- invokes the utility to zero out statistics for additional days in that month if you cannot use a range.

Click the Run Now button under the Run/Schedule tab for the Profile to reprocess the log data

Reset the Log Source by changing the Log File Path back to its original setting

Under the Storage/DB tab in the profile edit area, turn Log Tracking back on

Reprocessing an Entire Month:

The procedure for reprocessing an entire month's worth of data is identical to the single day procedure above, except when invoking the udb-sanitizer utility select Option 2, and delete this month entirely instead of Option 5.

Additional information:

The udb-sanitizer utility provides additional functionality for managing Urchin databases.

Question: I see hits in my Urchin reports, but no sessions or visits?

Answer: When there is trouble with the data in Urchin’s reports 9 times out of 10 it has to do with an incorrect log file format. This includes visitation data and e-commerce data. The best course of action is to manually examine the log format and make sure it matches your settings in Urchin. You can also run Urchin in debug mode to see how Urchin is processing each log hit to ensure that the data is parsed into the correct Urchin fields.

Another thing to check is the UTM Domain name. If the client is using the UTM tracking method then the UTM Domain setting in the Reporting tab of a profile needs to match the domain in the actual UTM file. Make sure that the case of the domain name in the UTM file matches the domain name in Urchin. A hash is created of the domain name and a difference in case will cause the hash to be different.

We ran into an interesting problem with a client running an IIS web server. When Urchin processed the log file page hit data appeared in the reports but no visits or sessions. The file looked to be a standard W3C log file until we dissected a hit in the log file. We discovered that IIS was not logging the uri-query properly. The uri-query was connected to the uri-stem so Urchin could not process the data into request_stem and request_uri.

The above situation leaves the client with only 1 option which is to manually (or create a script) correct the log file by separating the uri-query from the uri-stem. However, IIS should be doing this by default, though.

Urchin support worked with a client who was having a similar problem. They were able to resolve the issue by changing some configurations of the client’s CMS software. Below is his fix: We fixed the logging problem at our end by adding an exclusion to our CMS (Site Executive) so that it did not parse any files in a certain folder (into which we placed the UTM files). This stopped the incorrect query stem/string being appended to the utm.gif file within IIS logs. (Urchin v5.7.03 Windows)

Question: How do I back up my Urchin configuration on a nightly basis?

Answer: It is more than likely that you are backing up your website to guard against a hard drive failure, a hacker, or some other unseen disaster. If you are not backing up your data, stop reading, and go setup a basic backup system now.

Most system backups copy vital files to a remote device that can be used to restore missing or damaged files in an emergency. But what about application settings, are those backed up? If your server goes down, could you re-build your Urchin installation?

Rebuilding an Urchin installation can be a fairly simple task if you have a few profiles and filters. You can log in and re-create the profiles and filters manually without much trouble. But what if you have 50 profiles with numerous filters? Could you remember all the settings to rebuild Urchin and get your reports back? This is where a backup of your Urchin configuration becomes useful.

Urchin provides a number of command line tools that can be used to back up and restore an Urchin installation. Using these tools, you can create a simple script that dumps your Urchin configuration into a text file. This text file can then be imported back into Urchin to rebuild settings, profiles and filters.

To completely backup an Urchin installation you will need an Urchin Configuration backup file and the log files that are processed by Urchin. Backing up your log files is not covered in this article but is equally as important as backing up your Urchin configuration. I cannot stress enough how important it is to back up your log files. Without them, there is no way to rebuild your Urchin reports.

Please note that this article assumes you have some knowledge of the *NIX command line interface.

Creating the Urchin configuration backup file

To create a backup of the Urchin configuration, we use the uconf-export command line utility. Urchin command utilities are found in the util directory of your Urchin installation (usually /usr/local/urchin). The uconf-export provides a command line interface for reading the Urchin configuration database and exporting it in a readable text format. The exported data is an XML-type record format which is directly compatible with the uconf-import utility. Each record in the exported data corresponds to a configuration record in the Urchin configuration database.

The uconf-export function has the following options:

uconf-export [-h] (prints usage message and exits) uconf-export [-v] (prints version and exits) uconf-export [-f file] (omitting –f will write the file to standard out)

A typical export command would look like this:

/usr/local/urchin/util/uconf-export -f urchin-config-backup.txt

Running this command will export your current Urchin configuration to the file urchin-config-backup.txt. All profiles, filters and settings will be in the new text file. Please note that your server configuration may not allow you to run the uconf-export command. If you are having trouble running the command, please contact your system administrator.

Automating the backup process

Backup processes run the best when automated. Why? Because then you don’t need to think about them. On *NIX machines you can create a small bash script that calls the uconf-export command. Then automate the execution of that script by placing it in the crontab. The first step in creating the script is to actually call the uconf-export command:

#!/usr/local/bin/bash /usr/local/urchin/util uconf-export -f urchin-config

The above command will work just fine. It will output the configuration to the file urchin-config. But what if you want to restore the configuration from two days ago? That data is not available. By tweaking the name of the file you export the configuration to, you can save a week worth of configuration files.

#!/usr/local/bin/bash /usr/local/urchin/util uconf-export -f urchin-config-`date +%A`

This new command will output the configuration file named after the day that the script was run on. If the script was run on a Monday, you would have a file named urchin-urchin-config-Monday.

That’s all there really is to the backup script. One thing to note is that you want to move the backup script to a secure location that stores all of your backups.

Once the script is complete you need to add it to your crontab to automate the execution. How often you backup your configuration file depends on how often your Urchin configuration changes. In general, having a weeks worth of Urchin Configuration backups is sufficient.

Rebuilding your Urchin Installation

If you ever need to rebuild your Urchin installation you can simply use uconf-import to import the configuration file that is created using the export utility. After importing the Urchin configuration you will need to reprocess your log files to rebuild the reports.

References

To read more about the uconf-export utility visit the Urchin support center: http://help.urchin.com/index.cgi?&id=1488

Question: It appears that my profile is stuck with a status of 'pending'. How do I fix this?

Answer: Under certain rare circumstances such as a system crash, abnormal termination of the Urchin Scheduler, or killing a running task, it is possible that an Urchin task will be left with a status of Pending or Running when it is not actually in any of those states. This prevents the associated Profile from being run either with the Run Now button or by the Urchin Scheduler.

Procedure

In this situation, the uconf-driver utility can be used to clear the status of the Task and allow it to be run again. This utility can only be run from a command line shell so you will need to bring up a terminal window on a UNIX-type system or invoke a DOS shell on a Windows system. Once you have done so, change directory to the util directory in the Urchin distribution and invoke the following commands, replacing MyProfile with the actual Profile name. Note that on most UNIX-type systems you will need to put a leading "./" before the uconf-driver command and for Mac OSX, you may need to type "sudo ./" before each command.

1. In a command shell, stop the Urchin Scheduler first by typing:

[UNIX-type systems] cd /path/to/urchin/bin ./urchinctl -s stop cd ../util [Windows] cd C:\Program Files\Urchin\bin urchinctl -s stop cd ..\util

2. Now type the following commands in order:

uconf-driver action=set_parameter table=task name="MyProfile" cr_runnow=0 uconf-driver action=set_parameter table=task name="MyProfile" ct_runstatus=2 uconf-driver action=set_parameter table=task name="MyProfile" ct_completed=0 uconf-driver action=set_parameter table=task name="MyProfile" ct_status=1 uconf-driver action=set_parameter table=task name="MyProfile" ct_lockid=0

3. Finally, restart the Urchin Scheduler:

[UNIX-type systems] cd ../bin ./urchinctl -s start [Windows] cd ..\bin urchinctl -s start

Considerations

Be certain that Profile which you are resetting is indeed not running. Clearing the status on a Profile which is still running will confuse the Urchin Scheduler and potentially cause processing of the Profile to be re-invoked while it is still being run. This could cause corruption of the Urchin databases for that profile.

You should also check the output from uconf-export. If any profile has a ct_runstatus that is not equal to 2 then NONE of the profiles will run via the Urchin scheduler. They may run from the command line, but they will not run from the automated scheduler.

Question: What are all the various types of user-agents?

Answer: An extensive list of user-agents can be found on the following site: http://www.psychedelix.com/agents/index.shtml.

Question: Where can I find a list of mobile user-agents?

Answer: A list of mobile user-agent can be found on the following page: http://www.zytrax.com/tech/web/mobile_ids.html.

Question: What is a common Apache log format that will work with the utm tracking method?

Answer: When using the UTM tracking method you must use a web server log format that contains cookies. A common format is as follows:

"%h %v %u %t \"%r\" %&#62;s %b \"%{Referrer}i\" \"%{User-Agent}i\" \"%{Cookie}\""

Question: How Urchin handles the UTM domain name?

Answer: Urchin always hashes the domain name when using the UTM tracking method. This hash value then gets stored in a cookie that Urchin uses to track user actions. When it comes time to process the log data Urchin hashes the value of the "UTM Domain" entry [found in the "Reporting" tab] using the same algorithm that the UTM uses.

If the domain is set to "auto" in the UTM, Urchin will set the domain hash based on the domain the visitor is accessing. For example, if the visitor is accessing re.boston.com, the domain hash is based on re.boston.com. If the user hits www.boston.com, the domain hash is based on www.boston.com. If the UTM domain is empty, Urchin will grab the first utm hit in the log file and set the UTM domain to the domain hash of that hit.

If the value of the domain hash calculated by the UTM and stored in the log hits does not match the domain hash from Urchin, then Urchin will not properly process the data in the log file. Urchin will only process data that has the same domain hash in the _utm cookies as the "UTM domain" setting. More than likely you will see data in the "Hits Graph" report but not in any of the other Urchin reports.

Question: What is the log file format for a Helix (Media Server [Real Media])?

Answer: Info on the log file format can be found here: http://service.real.com/help/library/guides/helixuniversalproxy/htmfiles/tracking.htm#44459.

Question: Can I changing the value of the DD and %d wildcards?

Answer: Recently I spoke to a client who was using a separate machine to process his log files. At various times during the day Urchin would FTP over the log file and process the data. The client wanted to use wildcards in the log source so Urchin would always pick up the current day's file. The client set up the following log file path:

path-to-log-file/exYYMMDD.log

When the client configured the scheduler to run he noticed that Urchin would never get the current days log file. Instead it would always pull the previous days log file.

The reason why Urchin would not retrieve the current file is caused by the way Urchin interprets the DD wildcard and they typical way that log files are rotated. Urchin, by default, converts the DD wildcard to the previous day. So if today is the 20th, Urchin would interpret DD as the 19th. Usually a web server set up for daily log rotation creates a new log file right around midnight. Urchin is usually configured to process the log file sometime after midnight. In this situation, if Urchin was not configured to turn back the clock 24 hours it would pull the current day’s log file. By automatically changing the current time by negative 24 hours Urchin insures that the previous day’s log file is processed.

So, how did we get around this issue? Urchin provides a mechanism to change the data offset. When looking at the data for a log source, click on the Advanced Settings tab. At the bottom of the page there is a section named “Date/Time Wildcard Substitution in Log Path Name. Using the Hours edit box you can specify a plus or minus offset in hours. Because Urchin defaults to -24 hours, we entered a +24 to cause the DD wildcard to resolve to the current day.

Question: Are there any issues accessing a log file on a UNC Connection (mapped network drive)?

Answer: Urchin services run as LocalSystem by default, which is an account that has no credentials. In order for UNC connections to work, those credentials need to be inherited from a user on the local system that Urchin runs on. I'm guessing that user doesn't exist on your Urchin system.

Try this: add a user named 'urchin' to the Urchin system, with the same password that you're using on the remote system. Log in as this 'urchin' user once on the Urchin system. Now try running Urchin for the profile with the failed UNC connection and see what happens.

Why do I see a '-' in some of my reports and how can I remove it?

We've only see this issue come up when working with custom log formats. Urchin is displaying the literal "-" that is being extracted from the log files and stored in a custom field created to work with the custom log format.

"-" are not considered empty fields. There are some calculated fields that interpret "-" as (none) for ip address or (no referral) which are hard coded into the Urchin processing engine, however custom fields cannot be configured for the (none) label.

We checked with Google to see if there was a way to perform lower level validation on these values but there is not. The best way to remove the dash from Urchin reports is with a simple exclude filter.

I just installed the UTM tracking code, why does Urchin report previous visitors?

Urchin tracks user sessions using a counter in the __utma tracking cookie. The last number in the string is the session counter for the visitor.

When this problem occurs we usually discover that the tracking code was added to the website prior to the first log processing. When this happens the UTM starts to track unique visitors and incrementing the session counters.

The best way to troubleshoot this is issue is to examine the utma cookies in the log file you are processing. If the session counters are greater than 1 then pull some older logs. Identify the log file that has the first recordings of the utma cookie. The session counters in this log should all start at 1.

How do I fix the 'Profile has been locked by another process. Exiting.' error?

On rare occasions something can interfere with the Urchin processing. When this happens Urchin can lock the database for a profile. On Windows you may see an error message like this:

ERROR: (7047-323-308) Profile has been locked by another process. Exiting.

  • Stop all Urchin processes using the Service Manager. There should be two processes, the Urchin Scheduler and the Urchin Webserver.
  • Open Task Manager
  • KILL all Urchin processes listed in task manager
  • Restart Urchin services

Are page hits with a 304 "not modified" header counted as hits?

All log file hits with 2XX, 302, and 304 codes are counted as pageviews.

I'm getting the error 'Warning, Task Scheduler Disabled' when trying to process my log files.

There are 2 daemons, “urchind” and “urchinwebd”, that need to be running in order for log processing, reporting, and configuration administration to occur. The Urchin scheduler (urchind) runs the Urchin processing executable which processes the log data and inserts it into the Urchin database. Urchin cannot process log data if the Urchin scheduler is not running. If Urchin can not process the logs it displays the “Warning, Task Scheduler Disabled” error message. Some of the most common reasons for this error message are:

  • The Urchin scheduler service is not running. This usually happens when the server is rebooted and Urchin is not included in the system’s start up scripts.
  • On Unix systems, the urchind service is running but it is owned by a different user than that which owns the Urchin distribution.

The first step in troubleshooting this problem is to determine if the urchind service is up and running on your server. Urchin offers an Apache like control command that can display the status of Urchin. Navigate to the bin directory within your Urchin instalation (usually /usr/local/urchin) and type

urchinctl status

You should see a two line response:

Urchin webserver is running

Urchin scheduler is running

If you do not see either line then Urchin is not running. To start Urchin type:

urchinctl start

If you only see the first line, then the Urchin scheduler is not running. To only start the Urchin scheduler type:

urchinctl -s start

The -s flag will only start the scheduler process.

Usually re-starting either Urhin or the Urchin scheduler process fixes the problem. However, it is possible that the Urchin service is running, but it is owned by a different user than that which owns the Urchin distribution. To verify who urchind is running as you can grep the output of the process status list:

ps -auxwww | grep urchind

You should see output similar to this:

USER-NAME 91577 0.0 0.1 376 176 ?? Ss 14Jun05 11:40.44 /usr/local/urchin/bin/urchind

If the USER-NAME is different from the user that owns the Urchin distribution, you have to restart the service using the correct username. First, switch to the user that currently owns the urchind process and stop the process. Then swith to the correct Urchin user and start the process.

How does Urchin use the time zone offset found in Apache logs?

Urchin uses the time zone offset to adjust the date on which the hit occurred. For example, if a log file hit occurs at 3 AM on January 1, and the Time Zone Offset is 4 hours, then Urchin will update the internal database for December 31 and not January 1.

Here's an actual log file hit that I processed to illustrate the behavior. Note the date and time of the hit and the 'Day' that urchin updates in the internal database.

Hit: 206.188.4.165 - - [16/Jul/2006:03:11:03 -0400] "HEAD /healthcheck.htm HTTP/1.1" 200 0 "-" "-" "-" apache_time [16/Jul/2006:03:11:03 -0400] c_ip 206.188.4.165 cs_request HEAD /healthcheck.htm HTTP/1.1 sc_status 200 sc_bytes 0 cs_useragent - cs_cookie - cs_referer - cs_host - request_method HEAD request_url /healthcheck.htm request_version HTTP/1.1 request_uri /healthcheck.htm request_stem /healthcheck.htm request_directory / request_filename healthcheck.htm request_mime htm request_origfilepath /healthcheck.htm request_origmime htm useragent_complete (unknown) - (unknown) browser_base (unknown) platform_base (unknown) log_source_name 1-gold-web nonpages 1 hits 1 validhits 1 nonutmhits 1 nonrobothits 1 HDB Update(Table 9, Day 15): (unknown) - (unknown) 1 HDB Update(Table 12, Day 15): /healthcheck.htm 1 HDB Update(Table 13, Day 15): htm 1 HDB Update(Table 14, Day 15): 200 1 HDB Update(Table 26, Day 15): 1-gold-web 1

Can I change the language for the admin section of Urchin?

The Administration UI language settings cannot be modified unless your Urchin installation is a fully localized build (which requires additional licensing and customized binaries from our engineering team). Currently this is only available for Japanese and Spanish, the Portuguese dictionary can only be modified in the reporting interface.

Yes, the demo license will allow you to rewrite the portuguese language files and successfully use the product while a fully licensed installation will generate errors. The demo licenses are very liberal, and allow virtually any use of the product. That is by design, as a demo should demonstrate the full capabilities of the product. When fully licensed, the system has various access restrictions not present in the demo license, among which are restrictions on localization configuration changes. Localizing the Portuguese administration language settings is not permitted by your license (nor, at this time, by any license), and allowing it will require a custom build of the Urchin binaries.

How do I rename an existing profile in Urchin software?

Existing profiles cannot be renamed, however, you can create a duplicate profile with all the same data and settings under a new name which will provide the same result as changing the name of an existing profile. In order to do so, you will need to copy the original profile, edit the domain settings and move the report data.

Copying the Profile:

  • From the Urchin control panel select the Configuration menu.
  • Use the 'copy' button to copy the profile you wish to rename.
  • Enter a Website URL and any associated domains for the new profile.

Moving the data:

  • Open a command shell and stop the Urchin services: ~urchin/bin/urchinctl stop
  • Create a new data directory for the new profile
  • This directory must be named exactly as the profile is named. Use %20 for any spaces used in the profile name: Example 'new profile name' should be written as 'new%20profile%20name'
  • Ensure the new directory has the same ownership and permissions as the original profile directory.
  • Move all data files from the original profile data directory to the new directory.
  • Start the Urchin services again: ~urchin/bin/urchinctl start

NOTE: the default location of the Urchin report data is ~urchin/data/reports/[profile-name]/

Can I schedule Urchin to process logs without using the Urchin Scheduler?

You do not need to use the Urchin Scheduler to schedule Urchin's processing. In fact, if your installation is large (e.g. < 1000 profiles) you are discouraged from using the Urchin Scheduler.

To schedule Urchin to run without using the scheduler you can create a cron job that invokes urchin from the command line.

/path/to/urchin/bin/urchin -p"profile name"

What format does Urchin expect Google's AdWords PPC data to be in?

Urchin's default AdWords data file format is as follows:

date
skipped field
skipped field
search term
uri
impressions
skipped field
skipped field
skipped field
cost
skipped field
skipped field
skipped field
skipped field

This format can be changed to suit the client's needs.

Can I share log sources between different affiliations?

We recently had a client who had trouble adding a log source to a profile. The log source was imported from a rather large configuration file. The file imported without any errors and all of the log sources were visible in the log manager. We discovered that the problem was the imported log source belonged to an affiliation and the client was trying to add the log source to a new profile that was in a different affiliation.

You can not share log sources between affiliations.

What are the minimum fields required to process an IIS log file?

Urchin can provide very basic reporting if your IIS log files have, at the very least, the following fields:

  • Date
  • Time
  • C-IP
  • CS-URI-Stem
  • SC-Status
  • SC-Bytes

These are required fields. Without them you will not get meaningful reporting. However, this minimal logging does not provide enough information for Referral and Browser reporting. Therefore it is advisable to set more detailed logging properties for your IIS server.

I'm having trouble running the geo-update command in Urchin 5?

Problems with the Urchin 5 ge-update command are usually caused by one of two thing:

  1. Urchin is running on a network that requires a proxy to be used.
  2. The user that is running the command does not have the correct permissions to modify documents in the /path/to/urchin/data/geodata/* directory.

Before sending the client any information double check that the geo-update command is running on the EpikOne Urchin server. It may be that there is a problem with the Google Servers. If the command executes correctly on the EpikOne server try the trouble shooting techniques below.

If Urchin is running on a network that requires a proxy make sure the client has configured the Domain DB update settings correctly in Urchin 5. These setting can be found in the Settings -> DNS Database Update screen.

Another important thing to check are the user permissions. If the geo-update command is being run by a user with incorrect permissions then the update process will fail. The user must have the ability to write directories and files to the /path/to/urchin/data/geodata/* directory.

Finally, Google can check the status of the client's geo-data database. You can contact them via the ticketing interface in the VAR area. When contacting them send the contents of the geo-update -V command.

Can Urchin process a log file with two different line formats?

The Primary and Secondary lines in the log format are intended to support ecommerce log files, where the secondary lines are related to the primary lines (e.g. secondary per-item logging associated the primary transaction record). It's not a mechanism that can be used for reading two different (and independent) types of log lines in the same log. There's no support in Urchin for doing the latter.

What are some of the restrictions on the Urchin demo?

Yes, the demo license will allow you to rewrite the portuguese language files and successfully use the product while a fully licensed installation will generate errors. The demo licenses are very liberal, and allow virtually any use of the product. That is by design, as a demo should demonstrate the full capabilities of the product. When fully licensed, the system has various access restrictions not present in the demo license, among which are restrictions on localization configuration changes. Localizing the Portuguese administration language settings is not permitted by your license (nor, at this time, by any license), and allowing it will require a custom build of the Urchin binaries.

Why are certain files missing from the Urchin reports?

The answer depends on the tracking method that you are using. If you are using the UTM tracking method make sure all of your pages are tagged with the __utm.js tracking code. Pages that are not tagged will not appear in your reports. Also, Urchin only tracks pages that return a 200 or 304 status.

If you are using an IP based tracking method, make sure that you are not hitting Urchin's internal database table limit. By default, tables in the Urchin database have a limit of 60,000 records. If you hit this limit then all subsequent page hits will be bundled in a record named "(other)". If you are hitting the db table limit you can increase the table limit beyond 60,000 records.

How many log sources come with each Urchin module?

Campaign Tracking Module provides you with 2 extra log sources per profile, allowing you to add a total of 3 log sources per profile (1 Base + 2 for the module)

E-Commerce Module also provides you with 2 extra log sources, allowing you to add 3 log sources per profile.

The Urchin profit suite (E-commerce + Campaign Tracking) provides you with 4 extra log sources, allowing for 5 log sources per profile. (1Base + 4 for the Suite)

How do I register Urchin if the machine can not reach the internet?

If the Urchin machine can not connect to the internet then Google must manually create a license key for the installation Urchin. To do this Urchin needs the contact information for the purchaser and the inspector output from Urchin. Without the inspector output the license key generation can not happen.

Is there a way to export more data than what appears on the screen?

The only way to export all of that data for a report, regardless of what's visible on the screen, is to export the data in a Tab separated format. The word and Excel exports will only export what is visible on the screen.

Information about the Urchin DNS database update.

Urchin gets the domain db from a third-party source and update it as frequently as possible. the Urchin scheduler will run monthly and update the database at that time.

You can also create a custom mapping of IP addresses within Urchin. If you know how you want to map the unresolved IPs (e.g. they know a certain netblock belongs to a particular domain), you can add that information to the domain.local file and run the geodata utility to import it into the geodata databases.

Additionally, if you would like to provide the IP addresses that are not resolving (preferably with correct geographical information), we can forward that to the third-party source for inclusion in the geo-update data.

What is the maximum amount of memory that Urchin can use?

Urchin Software can use a maximum of 512 MB of memory. This can only be achieved by manually configuring the memory storage.

When setting a custom memory usage there are 5 data buffers that can be configured. Each data buffer corresponds to the data tables that Urchin stores report data to. The buffers are:
Visitor Buffer Size
Data Buffer Size
Session Buffer Size
Data String Buffer Size
Path Buffer Size

I recommend that you use the same ratio that is available by default in the custom memory storage settings. So if you increase the "Visitor Buffer Size" 5x larger, then increase all the other memory settings 5x high as well.

How do I enable logging for Urchin's internal web server?

To enable logging of Urchin's internal web server modify the urchin.conf file in the follwoing directory:

/path/to/urchin/etc/

Uncomment the following line:

uconfLogging: ON

If you would like to modify how the server runs tweak /path/to/urchin/var/urchin.conf.template. Urchin.conf gets rewritten dynmically when urchin web server is restarted with values from the template. Note that Urchin does not like it when you change the log file format.

How do I turn data center mode on and off?

Data Center Mode can be turned on or off using the uconf-driver command. uconf-driver, provides a command line interface for administering the Urchin 5 configuration. All functionality present in the Urchin 5 administration interface is available in this utility, thus it can completely replace the use of the administration interface for managing any facet of the Urchin 5 configuration. The uconf-driver is intended for use in situations where managing the Urchin 5 configuration through automated/unattended scripts is desired.

To turn data center mode on:

/path/to/urchin/util/uconf-driver action=set_parameter recnum=1 cr_dcmode=on

To turn data center mode off:

/path/to/urchin/util/uconf-driver action=set_parameter recnum=1 cr_dcmode=off

More information about uconf-driver can be found on Urchin's website here: http://help.urchin.com/index.cgi?id=1051

What does "ERROR: Received interrupt signal(2)" mean?

I've only encountered this error once. A client was using Urchin on Solaris. There were getting the following error:

------------------------------------------------------ Urchin 5.7.03 (solaris9) starting: 20060309 17:08:30 ------------------------------------------------------ Processing profile: lineup_letter [17:08:30] Logfile: /urchin/xmr_logs/xmsrweb004_access.200603080000.gz data lines: 1802234 (100%) data hits: 1802071 data proc: 345.69 MB in 00:01:27 (3.973 MB/sec) data range: 2006-03-06 23:59 (-0500) - 2006-03-08 00:00 (-0500) [17:09:57] Logfile: /urchin/xmr_logs/xmsrweb006_access.200603080000.gz data lines: 510000 (20%)^C ERROR: Received interrupt signal(2) WARNING: attempting to soft close database... done. ------------------------------------------------------ Urchin 5.7.03 (solaris9) finishing: 20060309 17:11:33 ------------------------------------------------------

To resolve this issue we turned on parallel log tracking and the error went away. We're not 100% sure why the error occured, but it worked.

How does ct_runstatus affect the Urchin scheduler?

Here's specifically how ct_runstatus affects Urchin's processing:

Suppose we have two profiles:

"sad.com" with ct_runstatus=3 "happy.com" with ct_runstatus=2

These actions will be successful:

/path/to/urchin/bin/urchin -p happy.com /path/to/urchin/bin/urchin -p sad.com * manually running happy.com through the web admin interface

These actions will be unsuccessful:

* manually running sad.com through the web admin interface * any scheduled profiles

So, to get things working in the long-run with automated processing, I recommend that you ensure that ct_runstatus=3 is nowhere present in the profiles. Your script didn't come through in your last message so I can't identify exactly why that failed. Doing so before or after using udb-sanitizer shouldn't matter.

How does Urchin calculate average session length?

The average length of session is calculated by first recording and storing the length of each session as the time difference between the first and last hit of the session. Note that this length does not contain any "virtual" time for viewing the last page.

The calculation has two parts. The base part is:

100% * (total length of sessions / number of sessions).

If there are more pageviews than sessions, we add another amount that estimates the virtual time for the last page. This part is calculated as:

100% * (total length of sessions / (pageviews - sessions)).

Note that this only works well where pageviews is much greater than sessions.

What type of database does Urchin use?

Urchin uses a proprietary flat file database for report data storage. The high-performance database architecture handles very high traffic sites efficiently. Some of the benefits of the data base architecture include:

  • Small database footprint – approximately 5-10% of raw logfile size
  • Small number of database files required per profile (9 per month of historical reporting)
  • Support for parallel processing of load-balanced webserver logs for increased performance
  • Databases are standard files that are easy to back up and restore using native operating system utilitiesv
Because the database files are simple flat files there is complete portability between Windows ans Unix platforms. This means that:
  • Reporting engine can read data crunched on any platform
  • No data migration necessary if Urchin platform is changed

Will the 500,000 limit on database table rows limit the amount of data I can store in Urchin?

The short answer is no, you should be able to load lots of historical data into Urchin for analysis. There are some users who process 25 GB of data on a daily basis. The Urchin documentation states that there is a maximum table size of 500,000 records, but this does not mean that you are limited to 500,000 visits. A record in the database table does not equal a singular hit to the website.

By default, the maximum database size is set to 10,000 records. The global limit can be raised to 60,000 by editing the Process Settings screen in the Urchin console. But increasing the limit over 60,000 can only be achieved by using the "uconf-driver" utility.

If you use uconf-driver to set a higher database limit, please note that there is a hard-coded limit of 500,000 records. However, caution must be used when increasing the database size beyond 60,000 records as it may affect disk space, log processing speed, and report delivery performance. It is strongly recommended that you increase the limit by no more than 25,000 records at a time so that you can find an acceptable compromise which gives you the increased database capacity you need, but still maintains an acceptable level of disk space usage and performance.

Technical note: Urchin's databases are based on hash table technology. Performance of these databases is good up to around 60,000 to 80,000 records, but increasing the database size beyond this can result in diminished performance. That is why the maximum database size can only bet set to 60,000 records in the web-based admin interface.

This information is from the Urchin help documentation and is available here: http://help.urchin.com/index.cgi?&id=1378

What does the following error mean: ERROR: (7008-54-441) DB file is the wrong size - run sanitizer.

DB file is the wrong size is typically caused by corruption in the databases from a terminally halted processing run (i.e. system shutdown or a killall command). When that happens, the system looks for latent lock.udb file and if found will roll back to the last backup (if present). If the backup is gone and your database is corrupt then you should delete the most recent month's data and reprocess. Sanitizing a certain day in the context of a corrupt db will actually only worsen the problem (well, technically, but it won't make the solution any more drastic -- rolling back or deleting the month).

Is Urchin software multi-threading?

Sort of. Urchin is not multi-threading in the Windows environment. However, on Non-Windows boxes you can launch multiple instances of. /urchin to process multiple profiles at once.

How do squid servers affect Urchin?

Squid proxy servers are usually used as reverse proxies in a website environment, where 1 Squid pulls data from multiple web servers to load balance the site across the servers. This usually means that the Squid server has its set of logs, and each web server has their own set of logs.

Basic environment with a squid server:

Browser -- | -- Internet -- | -- Squid ---- | ------- Web server 1 | ------- Web server 2 | ------- Web server 3 | ------- Web server 4

The Squid server logs the visitor’s ip address (since this is the machine that the visitor is accessing), and the web servers log the squid server’s ip address (since the squid server is the machine actually fetching the content from the web servers)

Squid Log File Format:

The native access.log has ten (10) fields. There is one entry here for each HTTP (client) request and each ICP Query. HTTP requests are logged when the client socket is closed. A single dash (-) indicates unavailable data.

  1. Timestamp : The time when the client socket is closed. The format is ‘Unix time’ (seconds since Jan 1, 1970) with millisecond resolution. This can be modified to visible format by ‘cat access.log | perl -nwe ‘s/^(\d+)/localtime($1)/e; print’;.
  2. Elapsed Time : The elapsed time of the request, in milliseconds. This is time between the accept() and close() of the client socket.
  3. Client Address : The IP address of the connecting client, or the FQDN if the ‘log_fqdn’ option is enabled in the config file.
  4. Log Tag / HTTP Code : The Log Tag describes how the request was treated locally (hit, miss, etc). All the tags are described below. The HTTP code is the reply code taken from the first line of the HTTP reply header. Non-HTTP requests may have zero reply codes.
  5. Size : The number of bytes written to the client.
  6. Request Method : The HTTP request method, or ICP_QUERY for ICP requests.
  7. URL : The requested URL.
  8. Ident : If ident_lookup is on, this field may contain the username associated with the client connection as derived from the ident service.
  9. Hierarchy Data / Hostname : A description of how and where the requested object was fetched.
  10. Content Type : The Content-type field from the HTTP reply

More information can be found http://www.squid-cache.org/Doc/FAQ/FAQ-6.html#ss6.6. General information about the Squid Cache can be found here http://squid-cache.org.

So how does the squid server cause problems for Urchin? If all your traffic is flowing through the Squid server, then you might want to have Urchin process the Squid log rather than the actual web server logs (the web server logs will probably only list the IP address of the Squid server). Unfortunately the Squid server access log only has 10 pieces of information in it. It’s not very robust.

We have heard of a client who configured the squid server to write data directly to the access logs of the web server. If the squid server is in fact logging data to the web servers while the servers are also logging the access from the caching server, it’s possible that they might be getting double the results, since each hit is being logged twice. You may want to review the logs from the web server and squid server to make sure this is not the case, and if so, create filters for the content, accordingly.

Can Urchin 5.X process binary log files?

Urchin 5.X can not process binary log files.

Do I need a load balancing module?

We often get questions concerning Urchin Load balancing module. Many customers are unsure if they actually need a load balancing module. The Load Balancing Module simply provides you with the ability to add another log source to your profiles, rather than being limited to only 1 log source per profile. It also allows Urchin to correlate visitors that may have hit more than 1 server in the server farm during their visit.

For example, if your site is load-balanced over 3 servers, you would need the Urchin 5 base product plus two load balancing modules.

We have heard of people concatenating log files into a single file and then having Urchin process that single file. The results have been mixed. This process violates the Urchin EULA.

Does urchin identify user session before or after applying filters?

Urchin identifies user sessions prior to applying any filters. This is important to note if you are trying to remove a session ID variable from the URI. You can safely create a profile filter to remove the variable and Urchin will still identify the user sessions correctly.

This only applies when using the Session ID tracking method.

What format does Urchin expect Overture's PPC data to be in?

By default, Urchin expects Overture's PPC data to be in the following format:

date
search term
skipped field
skipped field
skipped field
impressions
skipped field
cost
uri

What are the data fields used in Urchin 5?

Default data fields for Urchin 5 can be found in the following file of your Urchin installation: /path-to-uchin/lib/reporting/logformats/fieldlist.txt

What does the error "WARNING: (7026-76-83) Could not open log file - check permissions" mean?

The "could not open log file - check permissions" error is a generic one on a Windows machine. The Urchin code is getting a "cannot open the file" error from the OS, and is displaying this error. Unfortunately, the error codes from a Windows machine arent' very descriptive to begin with, so Urchin just assumes it a permissions problem.

With that said, for whatever reason, Urchin is unable to open the file and read through the contents, and is therefore causing the error. It could be benign, but there is no way to tell.

On a *nix machine there is a bit more that you can do. You can try to change the owner of the directory using the chown command. Just set the owner to the username that Urchin is running as.

Can I globally change the case setting of the URI stem in my log file?

Log files created on Windows platforms can have the same URLs in both uppercase and lowercase. This is due to the nature of the Windows platform. Urchin Software has a global setting that can force the URL to lowercase thus consolidating hits to the same page. To set the global case setting navigate to log file manager in your Urchin configuration. Choose a log source by clicking on the name. Click the "Advanced Settings" tab. There will be an option for "URI Stem to Lower Case". Click the "Yes" radio button. This will cause the URLs in the log file to lower case.

What information does the 'machine' of a configuration file hold?

The 'machines' section contains specific configuration settings for individual machines in an Urchin processing cluster, such as memory allocation.

Where is the Urchin configuration database located?

The Urchin configuration is located here:

/path/to/urchin/data/conf/.

Urchin processes my log file as one long line. Why?

This problem usually arises when the web server and Urchin are running on different operating systems. For example, if your web server is running on Mac OS 9 and Urchin is running on a Windows based OS then Urchin may read the log file as one long line. The problem is that each operating system uses a different character to represent a new line.

The new line character on Unix type systems (including OS X) is a line feed (\n). Windows uses a carriage return and a line feed (\r\n) for a new line. To make things even more interesting Mac OS 9 uses just a carriage return (\r).

To solve this problem you need to re-process your log file with some type of script (PERL, shell, etc) that replaces the new line character at the end of each log file line with a character consistent with the operating system that Urchin is running on.

How does the UTM Tracking Method for Urchin 5 work?

Urchin 5 has five different ways to track visitor data:

  • IP+UserAgent
  • Username
  • Session ID
  • Urchin Traffic Monitor (UTM)
  • IP Only

While all methods will do a good job of correlating hits into user visits, there are a number of visitor loyalty and client reports that are only available when using the UTM system. This system was specifically designed to negate the effects of caching and proxying and allow the server to see every unique click from every visitor without significantly increasing the load on the server.

When a profile is set to run as UTM, the only hits that Urchin reports are those that are logged as a request to for the __utm.gif file. Urchin (for the most part) ignores most other hits that aren't for the __utm.gif. The __utm.gif includes a long list of parameters, including screen color, page requested, title of the page, etc. These parameters are pulled into Urchin and become the foundation for many reports.

To give you a better understanding of how UTM works, here is an overview of the steps that is used with UTM:

  • Visitor accesses a UTM enabled site
  • __utm.js gets called, which writes 3 (or up to 5) cookies that contain unique visitor information, domain hashes (numerical representation of the domain), date/time of visit, and referring source (Google PPC, overture PPC, direct, organic search, email campaign, etc)
  • __utm.js also collects information about the clients machine: screen size, colors, flash version, javascript installed?,etc.
  • __utm.js calls the __utm.gif hit and appends a query with the list of variables collected: (i.e. __utm.gif?utmn=3250303161&utmp=/StaticPage.php)

During processing for a UTM enabled profile,

  • Urchin 'finds' the __utm.gif hits and only counts these as pageviews. Any page that did not have the __utm.js would not be able to call the __utm.gif and consequently not be counted in pageviews [*NOTE that only browsers can call the __utm.js script, so any bots, spiders, etc are automatically stripped from any results from the start]
  • Subsequent hits that are not the __utm.js are used by Urchin to determine a visitors session time by the cookies and visitor returns.

One very important thing to keep in mind, Urchin will only look at the records in the log file that are are for the __utm.gif. If these records do not contain certain pieces of information, like the host name, you will not be able to create filters based on the missing information.

How can I change the textual content of Urchin?

The text that appears in the reports and admin section of Urchin is stored in txt files. The txt files for the admin area can be found here:

/path/to/urchin/lib/admin/languages

The txt files for the reporting interface can be found here:

/path/to/urchin/lib/reporting/languages

e have been advised by Urchin engineering team that modifying these files may violate the Urchin EULA.

Is there a version of Urchin 5 for Solaris 10?

No, there is no version of Urchin 5 for Solaris 10. The rumor is that Urchin 6 will be compatible with Solaris 10.

Why do I see a drop in my visitors when changing Visitor Tracking Method from 'IP + UserAgent' to 'UTM'?

It's typical to see a drop in sessions when changing the way that Urchin tracks your visitor sessions. Although IP + UserAgent is a reliable method for tracking visitors, it still leaves a margin of "visitors" that aren’t really visiting your site. This includes certain bots & spiders that are recorded in your log file.

Also, when using the IP + UserAgent method, all the page hits on your website that are requested are recorded in the log file and added to the Urchin reports. This includes some pages that you would never think to tag with the UTM, like pop-ups. If you tagged every single page in your website with the UTM, then the results for using the IP + UserAgent method should be exactly the same as using the UTM method. However, if you don't tag all of your pages with the UTM then you will start to see a drop off in the number of visits.

Lastly, the timeout value that IP + UserAgent uses may not provide the best metric for measuring the lifetime of a visit to your site. The UTM uses cookies to track the users every move and only times out when there is no activity on your site after 30 minutes -- and doesn't assume that every visitor will only visit for 30 minutes.

My web application sets the session ID in either a cookie or a query string variable. How do I configure Urchin?

Web applications usually use a session identifier need to monitor the session ID to insure state during the user’s session. Traditionally this session ID has been stored on the user’s machine in a cookie. Sometimes, when users have cookies turned off, the web application adapts to the situation and appends the session ID to all requests sent to the server. Typically the session ID is attached to the request query. This can cause problems for Urchin especially if you’re using the Session ID tracing method.

The problem is that Urchin can only look for the application defined session ID in one location. So, if your application is setting the session ID in two locations, based on the user’s browser settings, Urchin will not be able to accurately count the total number of sessions.

The solution for this is to process all of your logs with a script that places the session ID in the URL or in a cookie. Then, when you configure your profiles, you can specify the location of the session ID and be sure it will exist in that location.

What are the data fields used in Urchin 5?

Default data fields for Urchin 5 can be found in the following file of your Urchin installation:

/path-to-uchin/lib/reporting/logformats/fieldlist.txt

How do I set up Urchin to work with Plesk?

The Plesk software package is a web hosting automation solution by SWsoft. It allows a server administrator to set up new websites, email accounts, and DNS entries through a web-based interface. Other services managed by Plesk include MySQL and PostgreSQL databases, Tomcat Java server, ColdFusion server, as well as CounterStrike and Battlefield 1942 game servers. The Plesk package can be used to implement a log rotation process for the web server.

When working with the Unix version of Plesk, you should be aware of a few things:

  • Apache is usually set up to log in CLF (common log format). This needs to change if using the UTM tracking method
  • By default, Plesk is set to rotate the logs based on file size, not date. The default size is 2 GB which should be lowered for performance reasons.
  • The path to the log files on a Plesk server is: /home/httpd/vhosts/domain_name/statistics/logs
  • The current log file that is active is named access_log
  • The log file that was previously processed is access_log.processed
  • Most people usually have Urchin process all log files using logs/*. MAKE SURE THAT LOG TRACKING IS TURNED ON.
  • Plesk can be configured to compress the log files
  • Plesk begins processing log files at 04:02 daily. I then sequentially processes all the logs for all the domains.

Are there any helper scripts for Urchin 5?

Urchin has created a number of Perl script that can be used to extend the functionality of Urchin 5 software. These scripts are hosed on the Urchin help site and can be found here:
http://www.google.com/support/urchin45/bin/answer.py?answer=32830&topic=7400.

The following scripts are available from Urchin by Google for your use. These scripts are provided for advanced users for their use in customizing their Urchin experience – please note that we are unable to provide support for the installation or use of these scripts.

  • u5data_extractor.pl: Script to retrieve data from the urchin.cgi engine and print a text-based report, which can be emailed, converted to HTML, etc. http://download.urchin.com/support/u5data_extractor.pl
  • u5scan_historylog.pl: This script will parse the Urchin 5 scheduler history file for errors for a particular date and print a notification if any profile exits with a non-zero exit status. http://download.urchin.com/support/u5scan_historylog.pl
  • archive_udata.pl: This Perl script examines the specified Urchin reporting data directory and performs pruning and compressing operations based on the command line options specified. http://download.urchin.com/support/archive_udata.pl
  • weblog_rotate.pl: Rotates the specified logs and names them with yesterday's date. The script also restarts the web server with a specified command and optionally compresses old logs and removes them after a certain period. http://download.urchin.com/support/weblog_rotate.pl
  • split_logs.pl: Splits an Apache Extended Combined access log into individual logs based on the virtual host logged in the second field. http://download.urchin.com/support/split_logs.pl
  • purge_udata.pl: This Perl script examines the specified Urchin reporting data directory and performs pruning of monthly Urchin databases based on the command line options specified. http://download.urchin.com/support/purge_udata.pl

How does Urchin 5 licensing work?

The Urchin activation process is as follows:

When installing a serial code your Urchin machine sends the serial code and key code to the Urchin licensing server. Urchin verifies these two values and returns a license key enabling the customer to complete the installation.

  1. ugetlicense utility automates the sending of the serial code and key code to the Urchin license center
  2. A unique license key is returned and written to the Urchin configuration. You can use uconf-export to export your Urchin configuration which contains this value.
  3. The key code is computed using:
    • Inode numbers of configuration database files (Unix systems)
    • System identifiers (Windows systems)
    • Note that the IP address is NOT used in the license key!
    • Also note that copying Urchin config databases or moving Urchin to another disk or partition will change the computed key code.
  4. Once activated no further network-based license authentication is done by any Urchin component.