Posts Tagged ‘Google Analytics’

Google Analytics Update

Wednesday, August 29th, 2012

Last year I wrote about taking apart a MySpace cookie.  Included in that posting was some discussion on Google analytics tools found within the cookie.  It was interesting and I got some good feedback about the blog entry.  I was contacted by Jim Meyer of the DoD Cyber Crime Center about some further research they had done on the Google analytics within cookies and a presentation they were preparing at the time for the 2012 DoD Cybercrime conference (if you saw the presentation at DoD let me know how it went).

They were able to determine more information about the specific pieces of the Google analytics cookie placed on a user’s computer when they go to a webpage that contains Google Analytics.

The Google Analytics Cookie collects stores and reports certain information about a user’s contact with a webpage that has the embedded Google analytics java code. This includes:

  • Data that can determine if a user is a new or returning user
  • When that user last visited the website
  • How long the user stayed on the website
  • How often the user comes to the site, and
  • Whether the user came directly to the website,
    •  Whether the user was referred to the site via another link
    • Or, whether the user located the site through the use of keywords.

Jim Meyer and his team used Googles open source code page to help define several pieces of the code and what exactly it was doing when downloaded. Here is some of what they were able to determine (The examples are the ones I used in my last posting with a little more explanation about what everything means. I explained how I translated the dates and times in my last posting). For a complete review of their findings contact Jim at the DoD Cyber Crime Center.  

Example

Cookie:            __utma

102911388.576917061.1287093264.1287098574.1287177795.3

__utma This records information about the site visited and is updated each time you visit the site.
102911388 This is a hash of the domain you are coming from
576917061 This is a randomly generated number from the Google cookie server
1287093264 This is the actual time of the first visit to the server
576917061.1287093264 These two together make up the unique ID for Google track users. Reportedly Google not track by person information or specific browser information.
1287098574 This is the time of the previous visit to the server
1287177795 This is the time last visited the server
3 This the number of times the site was been visited

 Example

Cookie:            __utmz

102911388.1287093264.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none) 

__utmz This cookie stores how you got to this site.
102911388  Domain hash
1287093264 Timestamp of when the cookie was last set
1 # of sessions at this time
1 # of different sources visitor has used to get to the site.
utmcsr Last website used to access the current website
=(direct) This means I went direct to the website, “Organic” would be from a google search, “Referring link” may show link coming from Search terms may.
|utmccn=(direct)  Adword campaign words can be found here
|utmcmd=(none) Search terms used to get to site may be in cookie here.

 Example

Cookie:            __utmb

102911388.0.10.1287177795 

__utmb This is the session cookie which is only good for 30 minutes.
102911388 This is a hash of the domain you are coming from
0 Number of pages viewed
10 meaning unknown
1287177795 The last time the page was visited

Remember though all of this can be different if the system deletes the cookies or the user runs an application that cleans the cookies out.  Also, it is all relative and depends on system and user behavior and when and how many times they have visited a particular site.

You can also go to find out more about the description of the cookies http://code.google.com/apis/analytics/docs/concepts/gaConceptsCookies.html#cookiesSet

Google Analytics can set four main cookies on the users machine:      

__utma Unique Visitors
__utmb Session Tracking
__utmc Session Tracking
__utmz Traffic Sources

Optional cookies set by Google Analytics:

__utmv Custom Value
__utmx Website Optimizer

Google Analytics creates varying expiration times for its cookies: 

__utma The information on unique user detection expire after 2 years
__utmz The information on tracking expire until 6 months).
__utmv The information on “Custom Tracking” will expire after 2 years
__utmx The information on the “Website Optimizer” will expire after 2 years
  The information about a current visit (visits) will expire after 30 minutes after the last pageview on the domain.

The original code schema written by Urchin was called UTM (Urchin Traffic Monitor) JavaScript code. It was designed to be compatible existing cookie usage and all the UTM cookie names begin with “_utm” to prevent any naming conflicts. 

Tracking the Urchin- from an investigative point of view

Okay so for some additional new stuff on Google analytics when examining the source code of a webpage. What is the Urchin? Google purchased a company called Urchin who had a technology to do traffic analysis. The technology is still referred in the cookies Urchin’s original names.

When examining a live webpage that contains Google analytics code embedded in the website you will come across code that looks similar to this:

<script type=”text/javascript”><!–var gaJsHost = ((”https:” == document.location.protocol) ? “https://ssl.” : “http://www.”);document.write(unescape(”%3Cscript src=’” + gaJsHost + “google-analytics.com/ga.js’ type=’text/javascript’%3E%3C/script%3E”));// –></script><script type=”text/javascript”><!–try {

var pageTracker = _gat._getTracker(”UA-9689708-5″);

pageTracker._trackPageview();

} catch(err) {}

// –></script> 

Search the source code for “getTracker” and you will find the following line: var pageTracker = _gat._getTracker(”UA-9689708-5″); which contains the websites assigned Google analytics account number “UA-9689708-5”. So what does this mean and how can it be of value to me when I am investigating a website? Let’s identify what the assigned number means: 

UA Stands for “Urchin Analytics” (the name of the company Google purchased to obtain the technology)
9689708 Google Analytics account number assigned by Google
5 Website profile number

How can I use this Google analytics number in an investigation? First you can go to http://www.ewhois.com/ to run the UA # and identify the company/person assigned the number.

The reponse you will get is something similar to this:

google analytics

Then run the Google Analytics number through Reverseinternet.com:

urchin

This is a little more of investigative use in that it is showing domains that use the same Google analytics Id, the Internet Protocol addresses assigned to the domains and the DNS servers used by the domains.

Using Reverseinternet.com allows you to identify any webpage where this Google Analytics Id has been embedded in the source code.  This can be of investigative value if the target has used the same Id on more than one webpage they control or monitor. Why would this occur? Google allows the user to monitor data from multiple sites from a single control panel.

So how does Google analytics work?

Google is probably a better place to find this out. You can go to http://code.google.com/apis/analytics/docs/concepts/gaConceptsOverview.html for a complete overview of how it works.

In short Google Analytics java code embedded in the webpage you visit collects information from the following sources when you connect to a webpage:

  • The HTTP request of the visitors browser
  • Browser/system information from the visitor
  • And it sends a cookie to the visiting system

All of this gives the webpage owner the ability to track persons going to their webpage. From an investigative point of view there is a certain amount of exposure due to the browser tracking that occurs and the fact that a cookie is placed on your investigative system. But there is the possibility from examining the page source code to tie the website through the Google Analytics Id to other webpages of interest.

Dissecting a MySpace cookie

Wednesday, May 18th, 2011

myspace_logoI previously looked at the MySpace source code and as an aside, I decided to look at the MySpace cookie placed on my computer through Internet Explorer. I need to spend some more time with it, but I found one tidbit of interest. Here are the contents of that cookie:

MSCulture
IP=76.232.69.187&IPCulture=en-US&PreferredCulture=en-US&Country=VVM%3D&ForcedExpiration=0&timeZone=-7&USRLOC=QXJlYUNvZGU9Nzc1JkNpdHk9UmVubyZDb3VudHJ5Q29kZT1VUyZDb3
VudHJ5TmFtZT1Vbml0ZWQgU3RhdGVzJkRtYUNvZGU9ODExJkxhdGl0dWRlPTM
5LjU1NDUmTG9uZ2l0dWRlPS0xMTkuODA2MiZQb3N0YWxDb2RlPSZSZWdpb25
OYW1lPU5WJkxvY2F0aW9uSWQ9MA

myspace.com/
1600
1450779520
30110255
767532288
30108847*
SessionDDF2
WecgMpqrHOI4tePW304hLLYkIoD8e+hqZQakpBfhu0bf+3YNd9a3gLJAKgrhd57+klMP1U9u
DlEKYfXnDvXE8w==
myspace.com/
1536
2677308160
31578165
1536619600
30108650
*__utma
102911388.576917061.1287093264.1287098574.1287177795.3
myspace.com/
1600
522347392
30255698
765392288
30108847
*
__utmz
102911388.1287093264.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
myspace.com/
1600
428951552
30145363
1564109600
30108650
*__utmb
102911388.0.10.1287177795
myspace.com/
1600
1584863104
30108851
765392288
30108847
*
__unam
7639673-12bb1c67c3e-6a4aaea5-1
myspace.com/
1600
3491813376
30163644
781442288
30108847
*

Here is an interesting part in Base64:

QXJlYUNvZGU9Nzc1JkNpdHk9UmVubyZDb3VudHJ5Q29kZT1VUyZDb3VudHJ5
TmFtZT1Vbml0ZWQgU3RhdGVzJkRtYUNvZGU9ODExJkxhdGl0dWRlPTM5LjU1NDUmT
G9uZ2l0dWRlPS0xMTkuODA2MiZQb3N0YWxDb2RlPSZSZWdpb25OYW1lPU5W
JkxvY2F0aW9uSWQ9MA

Here is the Base 64 Translation:

USRLOC=AreaCode=775&City=Reno&CountryCode=US&CountryName=United States&DmaCode=811&Latitude=39.5545&Longitude=-119.8062&PostalCode=&RegionName=NV&LocationId=0

The investigator should be aware that the latitude and longitude is generally based on the IP address geolocation. Again this is something you are revealing to the website when you visit it. The website automatically geolocates the IP address for general marketing purposes. As an investigator you need to be aware that you are exposing this information to the websites you surf. I’ll comment more on geolocation in another post.

Not that we all did not know that companies use tracking codes to identify us, but here is the type of information that might be on a suspect’s system if you go looking for it in his cookies. It also shows how much MySpace is tracking about you during an investigation and collecting about you when you go to a suspect’s MySpace page. I found a nice article at http://helpful.knobs-dials.com/index.php/Utma,_utmb,_utmz_cookies describing some of the cookie’s contents of the cookie.

The cookies named __utma through __utmz are part of Google Analytics, originally by the urchin tracking module, also by the newer ga.js. These cookies track usage on sites that use Google Analytics.”

The article goes on to describe the various pieces of the cookie.

__utma tracks each user’s amount of visits, first, last visit.
__utmz tracks where a visitor came from (search engine, search keyword, link)
__utmb and __utmc are used to track when a visit starts and approximately ends (c expires quickly).
__utmv is used for user-custom variables in Analytics
__utmk – digest hashes of utm values
__utmx is used by Website Optimizer, when it is being used

Another good description of the Google Analytic cookies and their contents can be found at MoreVisibility (A marketing website). There are many other sites that collect similar information such as NetcraftAlexa, and WMtips (each of these can be accessed from our free Internet Investigators Toolbar.

The __utma cookie appears to be a string with six fields, delimited by a “.”. The last field is a single integer which records the number of sessions during the cookie lifetime

Here are the various pieces of the cookie with the date and times translated:

Cookie Code Section Date and Time Translation*
myspace.com/
1600
1450779520
30110255
767532288
30108847
1450779520,30110255
Fri, 22 October 2010 13:23:15 -0800
767532288,30108847
Fri, 15 October 2010 13:23:15 -0800
SessionDDF2
WecgMpqrHOI4tePW304hLLYkIo
D8e+hqZQakpBfhu0bf+3YNd9a3g
LJAKgrhd57+klMP1U9uDlEKYfXn
DvXE8w==
myspace.com/
1536
2677308160
31578165
1536619600
30108650
2677308160,31578165
Mon, 14 October 2030 13:54:22 -0800
153661960,30108650
Thu, 14 October 2010 13:52:03 -0800
__utma
102911388.576917061.1287093264.
1287098574.1287177795.3
myspace.com/
1600
522347392
30255698
765392288
30108847
522347392,30255698
Sun, 14 October 2012 13:23:15 -0800
765392288,30108847
Fri, 15 October 2010 13:23:15 -0800
__utmz
102911388.1287093264.1.1.utmcsr=
(direct)|utmccn=(direct)|utmcmd=(none)
myspace.com/
1600
428951552
30145363
1564109600
30108650
428951552,30145363
Fri, 15 April 2011 01:54:24 -0800
1564109600,30108650
Thu, 14 October 2010 13:54:24 -0800
__utmb
102911388.0.10.1287177795
myspace.com/
1600
1584863104
30108851
765392288
30108847
1584863104,30108851
Fri, 15 October 2010 13:53:15 -0800
765392288,30108847
Fri, 15 October 2010 13:23:15 -0800
__unam
7639673-12bb1c67c3e-6a4aaea5-1
myspace.com/
1600
3491813376
30163644
781442288
30108847
3491813376,30163644
Thu, 14 July 2011 23:00:00 -0800
781442288,30108847
Fri, 15 October 2010 13:23:16 -0800

*Decoding of the dates and times are thanks to the free “Dcode” tool by Digital Detective.

Todd Shipley is Vere Software’s president and CEO.