Excluding Spam & ghost referrals in google analytics - part 2 of our ghost hunt!

Get more accurate data! Learn how to create filters and segments in Google Analytics which can hinder spam referrals registering as visits.

  • Charlotta Eriksson
  • 31 October 2016
  • 0

Exclude identified spam

When you have come to the conclusion that you have been affected by spam referrals, it is time to get rid of it. This is done by either excluding specific links you have identified, excluding already known links by Google or by setting up a valid hostname filter, depending on which type of spam you are looking at.

Make sure you have set up the right views to start with, so that you can test your changes before making them permanent but also to make sure you always have access to unfiltered data. Historically you cannot change what has been collected already which is why it is important to always keep an unfiltered view.

Setup for views in Google Analytics

The recommendation, when setting up your account in Google Analytics, is to create three views for each property; ”All Website Data”, ”All Website Data + Filters” and ”Test View”. How you name them is up to you but what each of them contains is more important.

  • All Website Data: This view should be left unfiltered and untouched. In this view you can always go back to all of the data that has been collected if you need to and not risking anything getting lost.
  • Test View: This is the place for testing new filters, to see how your traffic gets affected before you add them to your main view. Testing first allows you to make sure you are not filtrating any visitors that you otherwise would expect to show up and leaves you with more reliable data.
  • All Website Data + Filters: This is your main view, the one you should use to read your data day to day. You add your already tested filters here.

Handle spam traffic

1. Known robots and spiders

There is an already predefined list of known bots and spiders which you are free to use, all you have to do to exclude these from your data is to tick the box ‘Exclude all hits from known bots and spiders’. This can be found under View Settings when choosing Admin mode. This will not solve all of your troubles with spam referrals but does help and at the same time lets through the ones that are not out to harm you, for example to be able to rank in search engines.

2. Spam referrals

Next step is to add filters with the referrals you have identified as unwelcome traffic.

  1. In admin mode, click on Filters under the view you would like to work with (start with Test View) and then click on +Add Filter.
  2. Give your filter a name so that you know what it is meant to accomplish.
  3. Under filter type, choose Custom and Exclude, as we want to exclude the referrals that are not actually real visits.
  4. As Filter Field, choose Campaign Source
  5. In the text field Filter Pattern, fill out the links you want to be excluded. We will now need to make use of Regex, Regular Expressions. Regex is a method used to find patterns and work through a text; it will make it recognisable when looking for links to be excluded from collection. In Regex, different symbols have different meanings and you will need a couple to add your links into one string of text. For example, say that the link you would like to exclude looks something like this:
  • www.fake-traffic.com

Start by typing in the link as it looks but then add a backslash symbol \ before every dot and a pipe symbol | between every new link you add. See example below:

  • www\.fake-traffic\.com|spam-visitor\.se

Due to the fact that you cannot add however many signs you want in the text field, add as many links as you can and then add more filters if needed. Make sure you name your filters in a way so that you know where to add new links next time, for example Spam bots 1, Spam bots 2 and so on.

If you want to read more about Regex and try it out, click here: https://regex101.com/

3. Ghost referrers

If you have identified some of your referral links as ghost referrers, placing them into a filter will not be enough for the visit to have never occurred. Instead of excluding links of this kind, we can choose to only include the hostname that is actually valid, eg, yours. Your valid hostnames would be the places where you have implemented Google Analytics and the rest should therefore be ghost referrers. Although, there are a couple of known exceptions including:

  • translate.googleusercontent.com
  • webcache.googleusercontent.com

These will either be used for Google’s translation tool or Google’s cached version of the pages on your site (snapshots taken for backup by Google). Spammers update their techniques regularly to try and bypass your setup so be aware of this if choosing to add these. 

To add a segment where you only include valid hostnames go to admin mode, click on Segments in the View column and choose +New Segment. Name your segment in a way so that you know what it contains and then click on Conditions under Advanced. Choose Sessions and Include, to segment on sessions rather than users. In the tab Ad Content choose Hostname and in the tab contains choose matches regex.

In the text field, fill out the hostnames you want to include and break off with regex, as you did previously when adding filters for spam referrals. Do not add any space between the different hostnames and be careful with uppercase and lower case characters.

When your filters and segments have been tested in your Test View and checked to include vs. exclude your traffic as expected, just add them to your Master View (All Website Data + Filters). This can be done in admin mode by clicking on All Filters in the Account column.

How often?

It is recommended that you run a search through your account once a week, as new spam referrals are created regularly. Best practice is to filter out unwelcome traffic as soon as possible, so spam referrals are not clogging your data. Remember to set the dates in Google Analytics so that you only see the data from the date you last made changes and added filters. That way you do not have to work through data twice over. So one good strategy is to add the new filters on the last day of every month since then you know that particular bots will be removed in a new fresh month.

Hint!

Add annotations with descriptions, when adding new filters and make changes or activating events affecting how your data is collected. This way you can always go back and get insight into how your traffic has been affected and why.