• • Trucksess.com • •

General Trucksess.com statistics

Web server statistics for Trucksess.com
Report last modified: 2013-11-06, Wed, 17:29:03Z


Methodology

By describing how the report is constructed, this section improves understanding of the presentation. Details and commands given here can also help someone configure parts of his web statistics report like how it is done here. Note that not all commands used to produce the report are listed on this page. Commands with long lines display over several lines separated by the backslash character.

Contents of this section
Programs used to generate the report

Analog version 5.32 for the Mac OS produces the web stats report. The Analog application shows web server usage by analyzing logfiles. It is free, open source, and available for many platforms. DNSTran 1.3 for Mac OS Classic automatically converts numerical IP addresses (128.91.113.25) to hostnames (shdh-dhcp-132.wharton.upenn.edu) in the logfiles. This translation greatly improves the Domain and Organization reports by distinguishing top-level domains from each other and listing most organizations by their name as opposed to an IP block.

Logfile format and DNS processing

The web server logs are in the combined format. Each logfile spans a week and typically is processed by DNSTran one time after completion. Beginning in 2010-03, each logfile spans a month. Only looking up addresses in new logs avoids having to process an increasing number of old logfiles but sometimes IP addresses that failed to return a hostname do so later on. Processing old logs with DNSTran multiple times would include these newly found hostnames. However since this case does not occur that often, logs are normally IP translated just once.

The DNSTran configuration settings that have the greatest impact on address translations are expire-good 365 and expire-failed 14. DNSTran keeps a cache file of IP addresses looked up so that it does not have to ask the network every time. If an IP address that has a hostname is not seen in a logfile for 365 days then DNSTran removes the IP address from the cache. IP addresses without hostnames are checked again after 14 days if they occur in more than one logfile. Full list of active DNSTran commands:

translate yes
compress yes
level 9
verbose yes
private no
offset 0
divisor 60
force-exit no
expire-good 365
expire-failed 14
retry-failed 4
Lines commented out in the mandatory configuration file

The mandatory configuration file, manconf.cfg, must exist in order for Analog to run successfully. Inside that file certain lines are commented out so that they do not affect the report:

#HOSTALIAS *.dial-access.att.net dial-access.att.net
#REFALIAS http://search.yahoo.com/* http://search.yahoo.com/
#REFALIAS http://members.aol.com/* http://users.aol.com/*

The intent of the HOSTALIAS commands in the manconf.cfg is to merge hostnames for each user connection into one host. For example, proxy.aol.com is in this list since one visit from an AOL user often involves multiple servers with different hostnames. Observation of the hosts behind the other HOSTALIAS commands shows that dial-access.att.net hostnames are associated with one host per visitor and grouping them together would not be necessary.

The search.yahoo.com REFALIAS discarded the directory information, which also got rid of the query arguments needed for the Search word and Search query reports.

Avoiding the members.aol.com to users.aol.com referrer translation is mostly cosmetic since there are almost only members.aol.com referrers in the logs.

The default configuration file analog.cfg is loaded without modification but later configuration files change these defaults.

Support files for presentation

Using a LANGFILE command enables customization of the textual elements inside the report. The United States 24-hour clock language file, us24.lng, formed the base to make changes. The major edits to the file change the date formatting to resemble ISO 8601 year-month-day and make the HTTP status code names match what is given in RFC 2616. Even though changing the status code descriptions to the RFC sometimes makes them more cryptic, a web search for the code returns more matches with a more commonly used name.

Analog can use a stylesheet to enhance display of the report. The stylesheet for this report uses the same fonts as the rest of the website.

Sequencing of the reports

REPORTORDER sets the order of the individual reports produced by Analog. Overall, general and authoritative reports are towards the beginning while specific but fuzzy reports are at the end. The General summary is followed by the time reports, beginning with the largest unit, the Yearly report, down to the microscopic Five-minute report. The Daily summary goes before the Daily report and the other time summary reports precede the report for the same unit of time.

The Domain report comes right after the time reports as it is essentially the world report, totaling traffic by countries and other top-level domains. The Organization and Host reports naturally follow. Next are reports that summarize info about the files requested, such as the extension and file size. The Directory report precedes the Request report. The next three reports have to do with redirected requests and the three after that detail failed requests. In both classes of reports the referrer report is next to the Redirection and Failure reports to closely associate the data.

The remainder of the report shifts to pieces of data that tend to be less authoritative: user agents and referrers. Just a few operating systems cover all of the requests so this report is first, followed by two browser reports to round out the data from the user agent field. Referrer reports are next and then the derived search reports. The last three displayed reports are the most esoteric. Website visitors don’t have to login so any given username is superfluous.

REPORTORDER x-1QmWdDhwH6475-oZSctzPirEklIKL-pbBsfnNyY-vRMujJ
Site-specific changes
PAGEINCLUDE *.php,/pr/*
DIRSUFFIX index.php
#(e.g.: http://www.trucksess.com/swat/journal/index.php ->\
 http://www.trucksess.com/swat/journal/)
REFALIAS http://www.trucksess.com*/index.php*\
 http://www.trucksess.com$1/$2
HTMLPAGEWIDTH 75

The first two commands accommodate PHP by counting it as a page for Analog and making index.php requests look like directories. By default .html files and directories are considered pages. Pages in the /pr/ directory do not have file extensions and the requests considered for that directory would be only pages so all requests starting with /pr/ are counted as pages. Requests ending in index.php are shortened to end in a slash, like a directory request. REFALIAS does the same thing as DIRSUFFIX for within site referrers. The width of the page is greater than default to make the time report graphs appear less flattened.

Time adjustments

Even though the logfiles are split by month, they begin and end at slightly varying times from month to month. They also do not end near 23:59 on the last day of the month but rather continue for a few hours into the first day of the month. To avoid displaying partial data for the current month the TO command only considers requests until the final day of last month, which properly handles how the logfiles are broken up.

The web server is physically located in the United Kingdom and hence maintains British time, UTC in the winter and +0100 in the summer. No adjustment is made for this configuration. Analog is run in the United States Eastern time zone (-0500 in the winter and -0400 in the summer) but the Program started line at the top of the report is adjusted by 300 minutes to be more in alignment with British time.

TO -00-0131 #up until last month
TIMEOFFSET +300 #for Eastern time: UTC in the winter,\
 +0100 in the summer
Items excluded from the report

Due to a log rotation problem there is no log of requests for 2002W38 (2002-09-16/2002-09-22).

Primary hosts used to develop and maintain the website are not in the report. Directories and referrers used for testing, internal purposes, or are obsolete are excluded from the entire report if they would have been listed in any particular report.

The InternetSeer.com robot is the only user agent or browser not included in the web stats. This robot monitors availability of the web server. To see if the server is online the robot needs to make numerous requests throughout the day. Including this robot would greatly inflate the number of requests and therefore it is out of the results:

BROWEXCLUDE InternetSeer.com
Additional DNS resolution

While DNSTran converts most numerical IP addresses into hostnames, it cannot find a hostname for every address. An alternative method using HOSTALIAS to append the organization name to the address aims to keep the line [unresolved numerical addresses] in the Domain report under one percent of requests. This method of processing IP addresses without a hostname is similar to the method jdresolve uses. The jdresolve program utilizes recursive DNS resolution to find an organization name for the IP address by making an NS (name server) query with .in-addr.arpa appended to the reversed IP address.

For example given the IP 65.169.152.240, an NS query is made for 152.169.65.in-addr.arpa. Since no answer is returned, the first octet of the query is removed and another NS query is made, 169.65.in-addr.arpa. This query returns an NS record of ns1-auth.sprintlink.net, ns2-auth.sprintlink.net, and ns3-auth.sprintlink.net. (If this query didn’t return anything jdresolve would finally try 65.in-addr.arpa.) After removing the first part of the NS hostname it is appended to the original IP address, giving 65.169.152.240.sprintlink.net.

The method used for this report differs from what jdresolve does in a few ways. Most importantly the lookups are done manually and not by a script, so exceptions can be made for each lookup to improve relevance. The DNS query done with WhatRoute 1.7 PPC is of type Any instead of being restricted to NS. This setting can return additional information such as SOA (Start of Authority) and PTR (Domain name pointer) records. Earlier versions of jdresolve used SOA records instead of NS for recursive queries. If an IP address resolves normally that information would be in the PTR record. Sometimes this record is unavailable when DNSTran checks but is present later on when WhatRoute does a lookup. A recursive query can return a PTR record with an untruncated address so all recursive queries start with a full address, 240.152.169.65.in-addr.arpa in the example above.

Other differences include converting the dots within the IP address into dashes when formulating the hostname, 65-169-152-240.sprintlink.net. Analog has an easier time processing this formatting and can make a more accurate Organization report. When deciding what organization name to use, all answers to the query are considered and the most specific organization in the results is assigned to the IP. This practice usually favors general companies over network companies. The greatest number of relevant segments of a result are included in the organization name, including subdomains. For example, recursive DNS on 134.179.241.52 yields dec.state.ny.us as the organization name instead of state.ny.us or just ny.us.

Some returns to queries are not specific enough to really assign it to even a country. The IP address 24.31.246.210 does not return recursive records until reaching the last query, 24.in-addr.arpa, and from that result the organization name would be arin.net. However, this organization is an Internet registry company or a network information center, which assigns blocks of IP addresses to other companies. As a result it is unlikely that the Internet registry company is the correct organization for the request and IP addresses that return a network information center organization name are left as is.

Internet registry companies limited to one country, unlike arin.net for the Americas, can provide a single country code to append to the IP address. The IP 81.180.130.18 gives a preliminary organization name of rnc.ro, an Internet registry company. The name does end in .ro, which is enough to alias the IP as from Romania, 81-180-130-18.ro. Analog expects an organization name though, so to preserve the Organization report the first two octets of the IP address are added between the IP and country to form 81-180-130-18.81-180.ro. The exception to using the first two octets is if the IP address begins with something other than 24, 61 to 68, 80, 81, or 128 to 255. In that case only the first octet is the organization. This rule is the same as how Analog determines an organization for numerical IP addresses.

The HOSTALIAS command applies to any IP whose first three octets are the same. A future IP that differs only by the last octet would also match the HOSTALIAS command. DNSTran looks up this IP before HOSTALIAS changes the IP address. However, HOSTALIAS takes effect before the recursive query in WhatRoute with the full address, e.g.: 240.152.169.65.in-addr.arpa. Therefore, an IP that matches an existing HOSTALIAS command could have received a hostname from DNSTran but not from WhatRoute. It has been rare for an IP to resolve in WhatRoute but not in DNSTran.

Example HOSTALIAS commands:

HOSTALIAS 65.169.152.* 65-169-152-*.sprintlink.net
HOSTALIAS 81.180.130.* 81-180-130-*.81-180.ro

Download a file of HOSTALIAS commands described in this section. This file is updated when the report is updated.

Host aggregation

Some robots are spread among numerous hostnames that only differ by number, which increases the number of Distinct hosts served and tends to bury these robots in the Host report. HOSTALIAS merges these robot hostnames into superhosts, thereby lowering the number of distinct hosts and catapulting most of these superhosts to the top of the Host report.

HOSTALIAS crawl*-public.alexa.com crawl-public.alexa.com
HOSTALIAS crawler*.googlebot.com crawler.googlebot.com
HOSTALIAS crawl*.googlebot.com crawl.googlebot.com
HOSTALIAS j*.inktomisearch.com j.inktomisearch.com
HOSTALIAS lj*.inktomisearch.com lj.inktomisearch.com
HOSTALIAS cgi*.archive.org cgi.archive.org
#to avoid the alias after it
HOSTALIAS iahost*.archive.org iahost*.archive.org
HOSTALIAS ia*.archive.org ia.archive.org
Virtual subdomains for swarthmore.edu hosts

The Organization report lists subdomains for some organizations; in the case of Swarthmore College the subdomains are broad educational distinctions. Most swarthmore.edu hostnames give the name of the building (d100.roberts.swarthmore.edu) but not whether it is a residence hall, an academic building, or something else. To add a higher level of detail, Swarthmore subdomains that visited the website are aliased with an additional virtual subdomain of dorm, public-area, or faculty-staff. Computer labs are under public-area.

Newly seen swarthmore.edu subdomains are collected under the virtual subdomain misc.swarthmore.edu. Subdomains are listed separately in the Organization report when they reach 0.1% of the requests. If Analog lists misc.swarthmore.edu, all of the subdomains currently within this group are assigned to dorm, public-area, or faculty-staff so that no subdomains are under misc.swarthmore.edu.

It is worth noting that grouping by building names would need a layer of aliasing anyway since a number of Swarthmore buildings span multiple subdomains, such as willets01.swarthmore.edu and willets02.swarthmore.edu, and would appear as two separate lines in the Organization report.

#(e.g.: d100.dana01.swarthmore.edu -> d100.dana01.dorm.swarthmore.edu)
HOSTALIAS REGEXP:(.+\.)(dana01|hallowell01|mary-lyon01|mertza|mertzb|\
newreshall|palmer|parrish-dorm01|parrish-dorm02|parrish-dorm03|\
pittenger|roberts|whartonab|whartoncd|whartonef|willets01|willets02|\
wortha|worth-dorm)(\.swarthmore\.edu) $1$2.dorm$3
HOSTALIAS REGEXP:(.+\.)(cornell|cs|engin|kohlberg-pub|martin-pub|\
mccabe-pub|public|remote|sccs|scdf-pub|wireless)(\.swarthmore\.edu)\
 $1$2.public-area$3
HOSTALIAS REGEXP:(.+\.)(beardsley-2|contract2|hicks|its|lang|mccabe|\
parrish01|parrish02|parrish03|scd|septa-data|sproul)\
(\.swarthmore\.edu) $1$2.faculty-staff$3
HOSTALIAS *.swarthmore.edu *.misc.swarthmore.edu
User agent rewriting

While the following aliases for user agents apply to the entire report, they were setup to make a particular report more accurate. Also, to help recognize commonly seen user agent strings, a second alias for each user agent restores the Browser report display back to the original.

Make Macintosh not be determined by the string “Turing Machine”:

BROWALIAS "*Turing Machine*" "$1Turing machine$2"
BROWREPALIAS "*Turing machine*" "$1Turing Machine$2"

Make Netscape/6.2.1 become Mozilla/1.7.3, it is not clear why this user agent was typed Netscape:

BROWALIAS "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3)\
 Gecko/20040910" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;\
 1.7.3) Gecko/20040910"
BROWREPALIAS "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; 1.7.3)\
 Gecko/20040910" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;\
 rv:1.7.3) Gecko/20040910"

Make OS unknown become Unknown Windows; user agent when a type of proxy server requests images; all of the IPs associated with these user agents were using Windows or appeared to be a robot. Among those two categories most of the requests were Windows. For BorderManager also make the actual agent name display uniquely in the Browser summary.

BROWALIAS "Mozilla/3.01 (compatible;)"\
 "Mozilla/3.01 (compatible; Windows)"
BROWREPALIAS "Mozilla/3.01 (compatible; Windows)"\
 "Mozilla/3.01 (compatible;)"

BROWALIAS "Mozilla/4.0 (compatible; BorderManager 3.0)"\
 "BorderManager 3.0 (compatible; BorderManager 3.0; Windows)"
BROWREPALIAS "BorderManager 3.0 (compatible; BorderManager 3.0;\
 Windows)" "Mozilla/4.0 (compatible; BorderManager 3.0)"

Make OS unknown become Windows 95:

BROWALIAS "Mozilla/4.5 (compatible; MSIE 5.0; Win 95)"\
 "Mozilla/4.5 (compatible; MSIE 5.0; Win95)"
BROWREPALIAS "Mozilla/4.5 (compatible; MSIE 5.0; Win95)"\
 "Mozilla/4.5 (compatible; MSIE 5.0; Win 95)"

Make OS unknown become Windows 32-bit:

BROWALIAS *Win32* $1Windows32$2
BROWREPALIAS *Windows32* $1Win32$2

Make OS unknown become Macintosh:

BROWALIAS "Mozilla/5.0 (000000000; 0; 000 000 00 0; 00)\
 AppleWebKit/124 (KHTML, like Gecko) Safari/125" "Mozilla/5.0\
 (Macintosh; 0; 000 000 00 0; 00) AppleWebKit/124 (KHTML, like Gecko)\
 Safari/125"
BROWREPALIAS "Mozilla/5.0 (Macintosh; 0; 000 000 00 0; 00)\
 AppleWebKit/124 (KHTML, like Gecko) Safari/125" "Mozilla/5.0\
 (000000000; 0; 000 000 00 0; 00) AppleWebKit/124 (KHTML, like Gecko)\
 Safari/125"

Make Unknown Windows become Windows ME:

BROWALIAS "Mozilla/5.0 (Windows; U; Win 9x 4.90;*"\
 "Mozilla/5.0 (Windows ME; U; Win 9x 4.90;*"
BROWREPALIAS "Mozilla/5.0 (Windows ME; U; Win 9x 4.90;*"\
 "Mozilla/5.0 (Windows; U; Win 9x 4.90;*"

Make the actual agent names display uniquely in the Browser summary, previously most of them were part of Netscape (compatible):

BROWALIAS "Mozilla/5.0 (compatible; Yahoo! Slurp;\
 http://help.yahoo.com/help/us/ysearch/slurp)"\
 "Slurp/Yahoo! (compatible; Yahoo! Slurp;\
 http://help.yahoo.com/help/us/ysearch/slurp)"
BROWREPALIAS "Slurp/Yahoo! (compatible; Yahoo! Slurp;\
 http://help.yahoo.com/help/us/ysearch/slurp)"\
 "Mozilla/5.0 (compatible; Yahoo! Slurp;\
 http://help.yahoo.com/help/us/ysearch/slurp)"

BROWALIAS "Mozilla/5.0 (Slurp/cat; slurp@inktomi.com;\
 http://www.inktomi.com/slurp.html)"\
 "Slurp/cat (Slurp/cat; slurp@inktomi.com;\
 http://www.inktomi.com/slurp.html)"
BROWREPALIAS "Slurp/cat (Slurp/cat; slurp@inktomi.com;\
 http://www.inktomi.com/slurp.html)"\
 "Mozilla/5.0 (Slurp/cat; slurp@inktomi.com;\
 http://www.inktomi.com/slurp.html)"

BROWALIAS "Mozilla/5.0 (Slurp/si; slurp@inktomi.com;\
 http://www.inktomi.com/slurp.html)"\
 "Slurp/si (Slurp/si; slurp@inktomi.com;\
 http://www.inktomi.com/slurp.html)"
BROWREPALIAS "Slurp/si (Slurp/si; slurp@inktomi.com;\
 http://www.inktomi.com/slurp.html)"\
 "Mozilla/5.0 (Slurp/si; slurp@inktomi.com;\
 http://www.inktomi.com/slurp.html)"

BROWALIAS "Mozilla/2.0 (compatible; Ask Jeeves/Teoma*)"\
 "Ask Jeeves/Teoma (compatible; Ask Jeeves/Teoma*)"
BROWREPALIAS "Ask Jeeves/Teoma (compatible; Ask Jeeves/Teoma*)"\
 "Mozilla/2.0 (compatible; Ask Jeeves/Teoma*)"

BROWALIAS "Mozilla/3.0 (compatible; Indy Library)" "Indy Library"
BROWREPALIAS "Indy Library" "Mozilla/3.0 (compatible; Indy Library)"

BROWALIAS "Mozilla/4.0 (compatible; Netcraft Web Server Survey)"\
 "Netcraft Web Server Survey"
BROWREPALIAS "Netcraft Web Server Survey"\
 "Mozilla/4.0 (compatible; Netcraft Web Server Survey)"

BROWALIAS "Mozilla/4.0 compatible ZyBorg/1.0*"\
 "ZyBorg/1.0 compatible ZyBorg/1.0*"
BROWREPALIAS "ZyBorg/1.0 compatible ZyBorg/1.0*"\
 "Mozilla/4.0 compatible ZyBorg/1.0*"

BROWALIAS "Mozilla/5.0 (compatible; Googlebot/2.1;\
 *http://www.google.com/bot.html)" "Googlebot/2.1 (compatible;\
 Googlebot/2.1; *http://www.google.com/bot.html)"
BROWREPALIAS "Googlebot/2.1 (compatible; Googlebot/2.1;\
 *http://www.google.com/bot.html)" "Mozilla/5.0 (compatible;\
 Googlebot/2.1; *http://www.google.com/bot.html)"

BROWALIAS "Mozilla/4.5 (compatible; OmniWeb/4.0.6; Mac_PowerPC)"\
 "OmniWeb/4.0.6 (compatible; OmniWeb/4.0.6; Mac_PowerPC)"
BROWREPALIAS "OmniWeb/4.0.6 (compatible; OmniWeb/4.0.6; Mac_PowerPC)"\
 "Mozilla/4.5 (compatible; OmniWeb/4.0.6; Mac_PowerPC)"

BROWALIAS "Mozilla/4.0 (compatible; Cerberian Drtrs *)"\
 "Cerberian Drtrs/* (compatible; Cerberian Drtrs *)"
BROWREPALIAS "Cerberian Drtrs/* (compatible; Cerberian Drtrs *)"\
 "Mozilla/4.0 (compatible; Cerberian Drtrs $2)"

BROWALIAS "Mozilla/4.0 (compatible; www.HostItCheap.com\
 Hosting Client-Agent)" "www.HostItCheap.com Hosting Client-Agent"
BROWREPALIAS "www.HostItCheap.com Hosting Client-Agent"\
 "Mozilla/4.0 (compatible; www.HostItCheap.com Hosting Client-Agent)"

BROWALIAS "Mozilla(IE Compatible)" "IE Compatible"
BROWREPALIAS "IE Compatible" "Mozilla(IE Compatible)"

BROWALIAS "Mozilla/4.0 (compatible; AvantGo 5.2; FreeBSD)"\
 "AvantGo 5.2 (compatible; AvantGo 5.2; FreeBSD)"
BROWREPALIAS "AvantGo 5.2 (compatible; AvantGo 5.2; FreeBSD)"\
 "Mozilla/4.0 (compatible; AvantGo 5.2; FreeBSD)"

BROWALIAS "Mozilla/2.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)"\
 "T-H-U-N-D-E-R-S-T-O-N-E"
BROWREPALIAS "T-H-U-N-D-E-R-S-T-O-N-E"\
 "Mozilla/2.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)"

BROWALIAS "Mozilla/4.0 (compatible; grub-client-*)"\
 "grub-client/* (compatible; grub-client-*)"
BROWREPALIAS "grub-client/* (compatible; grub-client-*)"\
 "Mozilla/4.0 (compatible; grub-client-$2)"

BROWALIAS "Mozilla/5.0 (compatible; BecomeBot/*;\
 +http://www.become.com/*.html)" "BecomeBot/$1 (compatible;\
 BecomeBot/$1; +http://www.become.com/$2.html)"
BROWREPALIAS "BecomeBot/* (compatible; BecomeBot/*;\
 +http://www.become.com/*.html)" "Mozilla/5.0 (compatible;\
 BecomeBot/$2; +http://www.become.com/$3.html)"

BROWALIAS "Mozilla/5.0 (compatible; BecomeBot/*; MSIE 6.0 compatible;\
 +http://www.become.com/*.html)" "BecomeBot/$1 (compatible;\
 BecomeBot/$1; 6.0 compatible; +http://www.become.com/$2.html)"
BROWREPALIAS "BecomeBot/* (compatible; BecomeBot/*; 6.0 compatible;\
 +http://www.become.com/*.html)" "Mozilla/5.0 (compatible;\
 BecomeBot/$2; MSIE 6.0 compatible; +http://www.become.com/$3.html)"

BROWALIAS "Mozilla/3.01 (compatible; NPT 0.0 beta)"\
 "NPT 0.0 (compatible; NPT 0.0 beta)"
BROWREPALIAS "NPT 0.0 (compatible; NPT 0.0 beta)"\
 "Mozilla/3.01 (compatible; NPT 0.0 beta)"

BROWALIAS "Mozilla/* (compatible; alpha/06; AmigaOS)"\
 "alpha/06 (compatible; alpha/06; AmigaOS; Mozilla/*)"
BROWREPALIAS "alpha/06 (compatible; alpha/06; AmigaOS; Mozilla/*)"\
 "Mozilla/* (compatible; alpha/06; AmigaOS)"

BROWALIAS "Mozilla/4.0 (compatible; alpha 06; AmigaOS)"\
 "alpha 06 (compatible; alpha 06; AmigaOS)"
BROWREPALIAS "alpha 06 (compatible; alpha 06; AmigaOS)"\
 "Mozilla/4.0 (compatible; alpha 06; AmigaOS)"

BROWALIAS "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; Girafabot;\
 girafabot at girafa dot com; http://www.girafa.com)" "Girafabot"
BROWREPALIAS "Girafabot" "Mozilla/4.0 (compatible; MSIE 5.0;\
 Windows NT; Girafabot; girafabot at girafa dot com;\
 http://www.girafa.com)"

BROWALIAS "Mozilla/4.0 (MobilePhone SCP-5500/US/1.0) NetFront/3.0\
 MMP/2.0 FAKE (compatible; Googlebot/2.1;\
 +http://www.google.com/bot.html)" "Googlebot/2.1 (MobilePhone\
 SCP-5500/US/1.0) NetFront/3.0 MMP/2.0 FAKE"
BROWREPALIAS "Googlebot/2.1 (MobilePhone SCP-5500/US/1.0) NetFront/3.0\
 MMP/2.0 FAKE" "Mozilla/4.0 (MobilePhone SCP-5500/US/1.0) NetFront/3.0\
 MMP/2.0 FAKE (compatible; Googlebot/2.1;\
 +http://www.google.com/bot.html)"

BROWALIAS "Mozilla/5.0 (* Firefox/*" "Firefox/$2 ($1 Firefox/$2"
BROWREPALIAS "Firefox/* (* Firefox/*" "Mozilla/5.0 ($2 Firefox/$3"

BROWALIAS "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US)\
 AppleWebKit/85 (KHTML, like Gecko) OmniWeb/v558.48" "OmniWeb/558.48\
 (Macintosh; U; PPC Mac OS X; en-US) AppleWebKit/85\
 (KHTML, like Gecko) OmniWeb/v558.48"
BROWREPALIAS "OmniWeb/558.48 (Macintosh; U; PPC Mac OS X; en-US)\
 AppleWebKit/85 (KHTML, like Gecko) OmniWeb/v558.48" "Mozilla/5.0\
 (Macintosh; U; PPC Mac OS X; en-US) AppleWebKit/85\
 (KHTML, like Gecko) OmniWeb/v558.48"

BROWALIAS "Mozilla/5.0 (* Firebird/*" "Firebird/$2 ($1 Firebird/$2"
BROWREPALIAS "Firebird/* (* Firebird/*" "Mozilla/5.0 ($2 Firebird/$3"

BROWALIAS "Mozilla/5.0 Amiga-AWeb/3.4.167SE Amiga AMIGA amiga"\
 "Amiga-AWeb/3.4.167SE Amiga-AWeb/3.4.167SE Amiga AMIGA amiga"
BROWREPALIAS "Amiga-AWeb/3.4.167SE Amiga-AWeb/3.4.167SE Amiga AMIGA\
 amiga" "Mozilla/5.0 Amiga-AWeb/3.4.167SE Amiga AMIGA amiga"

BROWALIAS "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/* *"\
 "Larbin/$1 (X11; U; Linux i686; en-US; rv:1.4) Larbin/$1 $2"
BROWREPALIAS "Larbin/* (X11; U; Linux i686; en-US; rv:1.4) Larbin/*"\
 "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/$2"

BROWALIAS "Mozilla/5.0 (X11; U; Linux i686;* Epiphany/*"\
 "Epiphany/$2 (X11; U; Linux i686;$1 Epiphany/$2"
BROWREPALIAS "Epiphany/* (X11; U; Linux i686;* Epiphany/*"\
 "Mozilla/5.0 (X11; U; Linux i686;$2 Epiphany/$3"

BROWALIAS "Mozilla/4.7 [en](Exabot@exava.com)" "Exabot"
BROWREPALIAS "Exabot" "Mozilla/4.7 [en](Exabot@exava.com)"

BROWALIAS "Mozilla/4.7 [en](BecomeBot@exava.com)" "BecomeBot"
BROWREPALIAS "BecomeBot" "Mozilla/4.7 [en](BecomeBot@exava.com)"

BROWALIAS "Mozilla/4.0 (MobilePhone PM-8200/US/1.0) NetFront/3.1\
 MMP/2.0" "NetFront/3.1 (MobilePhone PM-8200/US/1.0) NetFront/3.1\
 MMP/2.0"
BROWREPALIAS "NetFront/3.1 (MobilePhone PM-8200/US/1.0) NetFront/3.1\
 MMP/2.0" "Mozilla/4.0 (MobilePhone PM-8200/US/1.0) NetFront/3.1\
 MMP/2.0"

Clean up some browsers in the Browser summary:

BROWALIAS "NaverBot-1.0 (NHN Corp. / +82-2-3011-* / nhnbot@naver.com)"\
 "NaverBot/1.0 (NHN Corp. / +82-2-3011-* / nhnbot@naver.com)"
BROWREPALIAS "NaverBot/1.0 (NHN Corp. / +82-2-3011-* /\
 nhnbot@naver.com)" "NaverBot-1.0 (NHN Corp. / +82-2-3011-* /\
 nhnbot@naver.com)"

BROWALIAS "larbin_*" "larbin/*"
BROWREPALIAS "larbin/*" "larbin_*"

BROWALIAS "Scooter-ARS-1.1" "Scooter/ARS-1.1"
BROWREPALIAS "Scooter/ARS-1.1" "Scooter-ARS-1.1"

BROWALIAS "CrawlConvera0.1 *" "ConveraCrawler/0.1 CrawlConvera0.1 *"
BROWREPALIAS "ConveraCrawler/0.1 CrawlConvera0.1 *"\
 "CrawlConvera0.1 *"
Referrer domain and session merging

Some referrers come from domains that sometimes have a subdomain such as www and other times do not. Even if the website automatically redirects example.com to www.example.com or vice versa, the referrer may still have the original site name. REFALIAS merges the two site names in the stat report:

#collect canonical hostnames for referrers
REFALIAS http://www.fubini.swarthmore.edu/*\
 http://fubini.swarthmore.edu/*
REFALIAS http://www.sxtindustries.com/* http://sxtindustries.com/*
REFALIAS http://trucksess.com/* http://www.trucksess.com/*
REFALIAS http://www.iconsurf.com/* http://iconsurf.com/*

When session IDs are present in referrers, they can dilute referrers that only differ by the session ID value and prevent a listing in the Referrer report. REFALIAS drops the session ID argument and aggregates referrers of that type:

#remove Mytoken, which is a session ID
REFALIAS http://profile.myspace.com/index.cfm?\
fuseaction=user.view?rofile&friendID=805397&Mytoken=* http://\
profile.myspace.com/index.cfm?fuseaction=user.viewprofile&\
friendID=805397
Increasing the number of words and queries in the search reports

Search arguments supply the data for the Search word and Search query reports as well as the Internal search query report. Analog uses the SEARCHENGINE and INTSEARCHENGINE commands to determine what arguments are queries for referrers and requests. To include as many search engine queries as possible a large configuration file is loaded that contains numerous SEARCHENGINE commands. This report uses the SearchEngines.txt file from Mike Shor updated on 2004-09-06. Observation of popular referrers showed that search.yahoo.com queries were in some additional arguments and Google queries for two IP addresses were not being recognized:

SEARCHENGINE http://search.yahoo.com/* q,p,va,vp
SEARCHENGINE http://64.233.167.104/* q
SEARCHENGINE http://64.233.161.104/* q

INTSEARCHENGINE defines what arguments to consider as queries for requests within the website. By design there are few search arguments for this site, but the rtnavexpand value, full, is displayed in the Internal search query report. The rtnavexpand argument has a value equal to full when the “Show all pages” link is clicked on the right side of the page. The number 1 in the Internal search query report is for arguments that evaluate to true.

Even with a substantial list of SEARCHENGINE commands, Analog sometimes cannot see the query in the referrer string. Using REFALIAS brings in these additional queries for the Search word and query reports.

Break an encoded query argument (q) into a separate argument:

REFALIAS http://images.google.*/*q%3D*%26*\
 http://images.google.$1/$2q%3D$3%26$4&q=$3
REFREPALIAS http://images.google.*/*q%3D*%26*&q=*\
 http://images.google.$1/$2q%3D$3%26$4

REFALIAS http://www.google.*/imgres*q%3D*%26*\
 http://www.google.$1/imgres$2q%3D$3%26$4&q=$3
REFREPALIAS http://www.google.*/imgres*q%3D*%26*&q=*\
 http://www.google.$1/imgres$2q%3D$3%26$4

REFALIAS http://216.239.37.104/*q%3D*%26*\
 http://216.239.37.104/$1q%3D$2%26$3&q=$2
REFREPALIAS http://216.239.37.104/*q%3D*%26*&q=*\
 http://216.239.37.104/$1q%3D$2%26$3

Remove empty search arguments that block SEARCHENGINE from seeing the argument that has the actual query:

REFALIAS http://*google.*/*as_q=&*as_oq=&* http://$1google.$2/$3$4$5
REFALIAS http://*google.*/*as_q=&* http://$1google.$2/$3$4

REFALIAS http://*search.msn.*/*q=&* http://$1search.msn.$2/$3$4
Report-specific commands

Commands found in sections below only apply to a single report.

Time report histories

The time reports other than the summaries can show all or just some of the time periods. Here, only the Yearly and Quarterly reports list all time periods. The time reports below them are limited to a fixed number of rows:

MONTHROWS 12
WEEKROWS 52
DAYREPROWS 31
HOURREPROWS 24
QUARTERREPROWS 24
FIVEREPROWS 24
Hierarchical report commands
#domain report e.g.: googlebot.com under .com
SUBDOMAIN *.*
#list subdomains for some domains in the organization report
SUBORG *.googlebot.com,*.swarthmore.edu,*.comcast.net,*.aol.com,\
*.*.rr.com,*.bellsouth.net,*.pacbell.net,*.sc,*.*.cox.net,*.msn.com,\
*.yahoo.net,*.*.verizon.net
SUBDIR */*/*/*/*/*/*/*/*/*/* #expand up to ten directories deep
#browser summary detail down to minor versions, e.g.: Mozilla/1.2.1
SUBBROW */*.*
#show counts for w/in site referrers by top-level dirs in referring\
 site report
REFDIR http://www.trucksess.com/*
Improving the Request report

The Request report only includes pages (.html, directories, .php, and /pr/*) and does not contain a separate line for requests that have the query argument rtnavexpand=full. Both of these changes make the report easier to read.

REQINCLUDE pages
REQEXCLUDE *rtnavexpand=full*
Display corrections in the Redirection report

These Redirection report aliases accurately show redirected requests made for ‘Show (all | fewer) pages’ on index pages. The DIRSUFFIX index.php command removes the index.php part of the request. The aliases affect trucksess.com to www.trucksess.com redirections inadvertently by adding index.php when the original request did not have that part.

#(e.g.: /swat/graphs/ -> /swat/graphs/index.php)\
 handle DIRSUFFIX index.php in the redirection report
REDIRALIAS */ */index.php
#(e.g.: /swat/graphs/?rtnavexpand=full ->\
 /swat/graphs/index.php?rtnavexpand=full)
REDIRALIAS REGEXP:(.*/)(\?.*) $1index.php$2
Robots in the OS report

The Known robots in the Operating system report are defined by the ROBOTINCLUDE command. Besides the list of robot inclusions found in manconf.cfg, a file of ROBOTINCLUDE commands from Jeremy Wadsack augments the Known robots. The file used for this report was produced on Sun Jul 3 06:47:08 2005.

Included below is a third set of ROBOTINCLUDE commands that reduces the OS unknown value in the Operating system report to less than 1% of requests. The criteria for determining a user agent as a robot is based on request patterns and web searches of the user agent string. Most robots do not request images or at least not in synchronization with page requests. Robots tend to ask for robots.txt whereas people using web browsers usually do not. Other webmasters post online asking if a particular user agent is considered by others to be a robot. Considering the information available leads to the decision that the following are robots and have a ROBOTINCLUDE command:

ROBOTINCLUDE "Aport,\
AVSearch-3.0(indice2/greenlist-sinquery-1),\
augurfind V-1.8 beta,\
augurnfind V-1.8,\
BlackMoss-1.0/Moresearch.com,\
Bumblebee@relevare.com,\
CacheabilityEngine/1.30 <http://www.mnot.net/cacheability/>"
ROBOTINCLUDE "Cowbot-0.1.1 (NHN Corp. / +82-2-3011-1954 /\
 nhnbot@naver.com),\
curl*,\
Dattatec.com-hosting-Econonico-Menos-de-10-Dolares\
 security_at_dattatec_dot_com,\
Exalead NG/MimeLive Client*,\
Franklin Locator 1.8,\
Generic,\
Green Research? Inc.,\
ia_archiver*"
ROBOTINCLUDE "Java*,\
larbin*,\
libwww-perl*,\
Missigua Locator 1.9,\
Mozilla,\
Mozilla/3.0 (compatible),\
Indy Library,\
InternetSeer.com,\
Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 ),\
Mozilla/4.0 (compatible; MSIE 5.00; Windows 98"
ROBOTINCLUDE "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT),\
Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent,\
Girafabot,\
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2;\
 .NET CLR 1.1.4322),\
Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)"
ROBOTINCLUDE "Netcraft Web Server Survey,\
ZyBorg/1.0 compatible ZyBorg/1.0*,\
Mozilla/4.0,\
Mozilla/4.5 [en] (Win98; I),\
Mozilla/5.0,\
Mozilla/5.0 (X11; Linux i686; en-US; rv:1.0rc5; OBJR),\
MSIE 6.0,\
MyApp/0.1 libwww-perl/5.65"
ROBOTINCLUDE "*NaverBot/1.0*naver.com*,\
NCSA Beta 1\
 (http://vias.ncsa.uiuc.edu/viasarchivinginformation.html),\
NPBot*,\
*Nutch*,\
oBot,\
Program Shareware*,\
Schmozilla/v9.14 Platinum"
ROBOTINCLUDE "Sqworm/2.9.85-BETA (beta_release; 20011115-775;\
 i686-pc-linux-gnu),\
SurveyBot*,\
TranSGeniKBot*,\
User-Agent: NG/1.0,\
W3C-checklink/3.9.2 [3.17] libwww-perl/5.64,\
WEP Search 00"
ROBOTINCLUDE "Wotbox/alpha0.5.1 (bot@wotbox.com;\
 http://www.wotbox.com)   Java/1.4.1_02,\
Zao/0.2 (http://www.kototoi.org/zao/),\
Zeus 2.6,\
Gigabot*,\
Jetbot/1.0,\
Teleport Pro/1.29.1590,\
Pompos/1.3 http://dir.com/pompos.html,\
msnbot*"
ROBOTINCLUDE "NextGenSearchBot 1 (for information visit http://*),\
Cerberian Drtrs/Version-3.1-Build-* (compatible;\
 Cerberian Drtrs Version-3.1-Build-*),\
NPT 0.0 (compatible; NPT 0.0 beta),\
Faxobot/1.0,\
Exabot,\
BecomeBot*"
ROBOTINCLUDE "sna-0.0.1 mikeelliott@hotmail.com,\
NG/2.0,\
webcollage*,\
xChaos_Arachne/4.1.73;GPL (DOS x86;WATTCP/1.05; 800x600?HiColor;\
 www.arachne.cz),\
Dillo*,\
Arexx,\
appie 1.1 (www.walhello.com),\
Mozilla/4.0 (emulation; MPM BUG),\
AnoProxy 1.01"
ROBOTINCLUDE "Butch__2.1.1 (agdm79@mail.ru),\
grub-client/* (compatible; grub-client-*),\
ichiro/* (ichiro@nttr.co.jp),\
Avant Browser (http://www.avantbrowser.com),\
FavOrg,\
Mozilla/4.0 (compatible; DB Browse 4.3; DB OS 6.0),\
Epsilon/10.03"
ROBOTINCLUDE "DoCoMo/1.0/D503iS/c10,\
*MJ12bot/v*http://*majestic12.co.uk/*bot.php?*,\
amaya/5.1 libwww/5.3.1,\
DTAAgent,\
psycheclone,\
Snapbot/1.0"
ROBOTINCLUDE "voyager/1.0,\
192.comAgent,\
SumeetBot (Sumeet Bot;\
 http://64.124.122.252.webaroo.com/feedback.html),\
SBIder/SBIder-0.8.2-dev (http://www.sitesell.com/sbider.html),\
Mail.Ru/1.0"
ROBOTINCLUDE "Mozilla/5.0 (compatible; IDBot/1.0;\
 +http://www.id-search.org/bot.html),\
Mozilla/5.0 (compatible; LiteFinder/1.0;\
 +http://www.litefinder.net/about.html),\
Mozilla/5.0 (compatible; Gigamega.bot/1.0;\
 +http://www.gigamega.net/bot.html)"
ROBOTINCLUDE "Mozilla/5.0 (compatible; perl.bot/1.1;\
 mailto:supportgigamega@gmail.com),\
Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/),\
panscient.com,\
Mozilla/5.0 (compatible; discobot/1.?;\
 +http://discoveryengine.com/discobot.html)"
ROBOTINCLUDE "Mozilla/5.0 (compatible; WebDataCentreBot/1.0;\
 +http://WebDataCentre.com/),\
CCBot/1.0 (+http://www.commoncrawl.org/bot.html),\
Mozilla/5.0 (compatible; Charlotte/1.1;\
 http://www.searchme.com/support/)"
ROBOTINCLUDE "Yanga WorldSearch Bot v1.1/beta\
 (http://www.yanga.co.uk/),\
WinHttp,\
Mozilla/5.0 (compatible; MSIE 6.0; ) Gecko,\
*Yandex*/*.0*; *),\
Mozilla/5.0 (compatible; Purebot/1.1; +http://www.puritysearch.net/)"
ROBOTINCLUDE "Linguee Bot (http://www.linguee.com/bot;\
 bot@linguee.com),\
ShopWiki/1.0 ( +http://www.shopwiki.com/wiki/Help:Bot),\
Mozilla/5.0 (compatible; spbot/2.0.2;\
 +http://www.seoprofiler.com/bot/ ),\
BPImageWalker/2.0 (www.bdbrandprotect.com)"
ROBOTINCLUDE "swish-e http://swish-e.org/,\
hoge (co2h2onacl@gmail.com),\
TwengaBot-Discover (http://www.twenga.fr/bot-discover.html),\
Mozilla/5.0 (compatible; bingbot/2.0;\
 +http://www.bing.com/bingbot.htm),\
Mozilla/5.0 (compatible)"
ROBOTINCLUDE 'Internet Explorer 8.0; WinXP,\
Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com),\
Mozilla/5.0 (compatible; NerdByNature.Bot;\
 http://www.nerdbynature.net/bot)'

Robots with multiple versions have a * to include all of them. Some user agents use aliased names done for the Browser summary.

Analog documentation

Further information about using Analog is in the documentation.

Trucksess.com
Family
Maps
Photo Relator
Statistics
  • Website
     • General
Swarthmore
Misc
Search
Show all pages

Go to top | Website | Statistics | Front page

Front page | Family | Maps | Photo Relator | Statistics | Swarthmore | Misc | Search

Last modified: 2013-11-06, Wed, 17:29:03Z | First created: 2003-12-17
Page served: 2024-10-08, Tue, 06:01:40Z | Processing time: 0.06 seconds
URI: http://www.trucksess.com/stats/website/general.php