Add custom fields to the EPiServer Search index with EPiServer 7

If you have an EPiServer 7 site with basic search requirements not warranting the full blown EPiServer Find search engine, you can come a long way with EPiServer Search and some customized indexing.

  • Ted Nyberg
  • 2 April 2013
  • 0

Prerequisites

This post assumes you have EPiServer Search properly set up for your website, i.e. the service is active and you can properly get to the WCF endpoint at http://yoursite/IndexingService/IndexingService.svc (it should say ”Endpoint not found”).

If you browse to the update method at http://yoursite/IndexingService/IndexingService.svc/update/?accesskey=your access key you should see “Method not allowed”. Oh, and for the sake of clarity: EPiServer Search is not EPiServer Find. :)

Concept

If your site has EPiServer Search enabled all pages and VPP files are automatically indexed with a set of default fields. The idea here is to add custom fields to the Lucene search index with additional content when an item is being indexed and then include those custom fields when searching for content.

Glossary of terms

Content is actual EPiServer content, i.e. anything that implements IContent (such as PageData for pages).

Document is a Lucene search index document, i.e. the actual data being added to the search index.

Customize how pages are indexed

Wire up an event to add custom fields to the Lucene search index

First we create an initializable module to wire up the DocumentAdding event of the IndexingService class (in the EPiServer.Search.IndexingService namespace). This event will trigger any time a Lucene document is being added to the index. The EventArgs object will in fact contain a Document property with the document being indexed, more on that later.

[ModuleDependency(typeof(EPiServer.Web.InitializationModule))]
public class SearchInitialization : IInitializableModule
{
    public void Initialize(EPiServer.Framework.Initialization.InitializationEngine context)
    {
        IndexingService.DocumentAdding += CustomizeIndexing;
    }

    void CustomizeIndexing(object sender, EventArgs e)
    {
        // Add custom fields to search document            
    }

    public void Preload(string[] parameters) 
    {
    }

    public void Uninitialize(EPiServer.Framework.Initialization.InitializationEngine context)
    {
        IndexingService.DocumentAdding -= CustomizeIndexing;
    }
}

Index custom fields for a page

We add custom fields to the search index in the event handler hooked up to the DocumentAdding event. First we look at the EventArgs type. If it’s of type AddUpdateEventArgs we know that a document is being added or updated.

Once we know that something is being indexed, we check to see if the content being indexed is a VPP file. In this example we simply skip those as we don’t need any additional content to be indexed.

Next we get the actual content being indexed. If it’s an EPiServer page we add a custom field with the name of the page’s parent page. Obviously this is just for the sake of this example, you can probably come up with more important things to index. :)

void CustomizeIndexing(object sender, EventArgs e)
{
    var addUpdateEventArgs = e as AddUpdateEventArgs;

    if (addUpdateEventArgs == null)
    {
        return; // Document is not being added/updated
    }

    // Get the document being indexed
    var document = addUpdateEventArgs.Document;

    if (document.IsUnifiedFileDocument())
    {
        return; // We don't customize VPP file indexing
    }

    var content = document.GetContent<IContent>();

    var page = content as PageData;

    if (page == null || PageReference.IsNullOrEmpty(page.ParentLink))
    {
        return;
    }

    // We want to be able to search by the name of the parent page
    var parentName = ServiceLocator.Current.GetInstance<IContentRepository>().Get<PageData>(page.ParentLink).PageName;

    document.Add(new Field("MY_CUSTOM_FIELD", parentName, Field.Store.NO, Field.Index.ANALYZED));

    // A search hit on parent name is more important than average
    document.SetBoost(1.5f);
}

Get the content being indexed

In order to add custom fields we often need the actual IContent being indexed. The code above uses the following extension method called GetContent which gets the content associated with a specific document. It also uses the following extension method called IsUnifiedFileDocument to check if a document is in fact for a VPP file:

public static class DocumentExtensions
{
    public static T GetContent<T>(this Document document) where T : IContent
    {
        // EPiServer Search adds a field called 'EPISERVER_SEARCH_ID' which contains the content GUID
        const string fieldName = "EPISERVER_SEARCH_ID";

        var fieldValue = document.Get(fieldName);

        if (string.IsNullOrWhiteSpace(fieldValue))
        {
            throw new NotSupportedException(string.Format("Specified document did not have a '{0}' field value", fieldName));
        }

        var fieldValueFragments = fieldValue.Split('|'); // Field value is either 'GUID|language' or just a GUID

        Guid contentGuid;

        if (!Guid.TryParse(fieldValueFragments[0], out contentGuid))
        {
            throw new NotSupportedException("Expected first part of ID field to be a valid GUID");
        }

        return ServiceLocator.Current.GetInstance<IContentRepository>().Get<T>(contentGuid);
    }

    public static bool IsUnifiedFileDocument(this Document document)
    {
        var underlyingTypes = document.Get("EPISERVER_SEARCH_TYPE");

        return !string.IsNullOrWhiteSpace(underlyingTypes) && underlyingTypes.Contains("UnifiedFile");
    }
}

Searching for content

Basic full text search

A basic search for pages in EPiServer (including default index fields only) can be done like this…

var foundPages = Search<PageData>("find this");

…with the Search method looking like this:

public IEnumerable<T> Search<T>(string keywords) where T : IContent
{
    // We'll combine several queries and all must match
    var query = new GroupQuery(LuceneOperator.AND);

    // Only search for content of type T
    query.QueryExpressions.Add(new ContentQuery<T>());

    query.QueryExpressions.Add(new FieldQuery(keywords));

    // Only search for content the current user has permission to read
    var accessQuery = new AccessControlListQuery();
    accessQuery.AddAclForUser(PrincipalInfo.Current, HttpContext.Current);
    query.QueryExpressions.Add(accessQuery);

    var searchHandler = ServiceLocator.Current.GetInstance<SearchHandler>();

    // Perform search
    var results = searchHandler.GetSearchResults(query, 1, int.MaxValue);

    var contentSearchHandler = ServiceLocator.Current.GetInstance<ContentSearchHandler>();

    // Convert search result to pages
    return results.IndexResponseItems.Select(contentSearchHandler.GetContent<T>);
}

Include custom fields in search

So far we’ve added a custom field to the search index. The next step is to include that custom field when searching. To to this we first create a GroupQuery and add a ContentQuery to limit the content type(s) we want to search for, just like in the first example.

Next we add an inner GroupQuery with several field queries, each combined with an OR operator to ensure we don’t have to get keyword hits in all fields:

public IEnumerable<T> Search<T>(string keywords, string customFieldKeywords) where T : IContent
{
    // We'll combine several queries and all must match (AND condition)
    var query = new GroupQuery(LuceneOperator.AND);

    // Only search for content of type T
    query.QueryExpressions.Add(new ContentQuery<T>());

    // Search for keywords in any of the fields specified below (OR condition)
    var keywordsQuery = new GroupQuery(LuceneOperator.OR);

    // Search in default field
    keywordsQuery.QueryExpressions.Add(new FieldQuery(keywords));
    
    // Search in custom field and boost the importance of any hits
    keywordsQuery.QueryExpressions.Add(new CustomFieldQuery(keywords, "MY_CUSTOM_FIELD", 2.0f));
    
    query.QueryExpressions.Add(keywordsQuery);

    // The access control list query will remove any pages the user doesn't have read access to
    var accessQuery = new AccessControlListQuery();
    accessQuery.AddAclForUser(PrincipalInfo.Current, HttpContext.Current);
    query.QueryExpressions.Add(accessQuery);

    var searchHandler = ServiceLocator.Current.GetInstance<SearchHandler>();

    // Perform search
    var results = searchHandler.GetSearchResults(query, 1, int.MaxValue);

    var contentSearchHandler = ServiceLocator.Current.GetInstance<ContentSearchHandler>();

    // Convert search result to pages
    return results.IndexResponseItems.Select(contentSearchHandler.GetContent<T>);
}

Adding a query for a custom field

The QueryExpressions property is in fact a collection of IQueryExpression objects. The FieldQuery class, which ships with EPiServer, only allows you to add queries for the default fields. In order to be able to add queries for custom fields we add a CustomFieldQuery class which allows you to specify the name of the field to search in, with our without term boosting:

public class CustomFieldQuery : IQueryExpression
{
    public CustomFieldQuery(string queryExpression, string fieldName)
    {
        Expression = queryExpression;
        Field = fieldName;
        Boost = null;
    }

    public CustomFieldQuery(string queryExpression, string fieldName, float boost)
    {
        Expression = queryExpression;
        Field = fieldName;
        Boost = boost;
    }

    public string GetQueryExpression()
    {
        return string.Format("{0}:({1}{2})", 
            Field, 
            LuceneHelpers.EscapeParenthesis(Expression),
            Boost.HasValue ? string.Concat("^", Boost.Value.ToString(CultureInfo.InvariantCulture).Replace(",", ".")) : string.Empty );
    }

    public string Field { get; set; }

    public string Expression { get; set; }

    public float? Boost { get; set; }
}