Python script for Visitor Paths

Description:

Last updated: April, 2010

download: Visitor Path analysis

This Python script analyzes collections of Apache (common format) log files (aka log analyzer) and produces two types of statistics: (1) visitor paths of the individual visitors (by IP). If you are using Awstats, you can link to its DNS cache and it will try to resolve the IP. (2) global visitor paths. Aggregated numbers on how many visitors went from page A to page B. In the setParameter section of this script you can turn on/off statistic of type 1 (as this is very verbose and is generated in memory) and only generate statistics of type 2. Type 2 statistics are generated as comma seperated values (csv) and are easy to import in a spreadsheet application for further analysis (cvs example). There are several other parameters in the setParameter section (e.g. location of the log files) that you can set before running the Python script

When you run the scripts with multiple log files, make sure that the log files are in the right order (in the setParameter section). Logfiles contain the oldest entries first, and the newest entries last. Order the log files from oldest log file (first) to most recent log file (last). The visitorPaths.py log file analyzer will then analyze the entries from oldest to newest which will be reflected in the text based output. The screenshot below shows a sample of how the output would look like. Per IP/domain there is an ordered list of date/time, page visited and referer (search engine, other site, etc..). A dash (-) means there was no referer.

visitor paths output
Visitor Paths Output Example.