SearchExtensions: Ranked search results for IQueryable search terms

by John Nye

16 Jan
2014

I have recently updated my search extensions project to enable ranked search results. This enables a user to search for a term within a property but also order the results by the most relevant according to the number of hits.

Full source code can be found here: https://github.com/ninjanye/searchextensions

The SearchExtensions nuget package is also available by running the following

PM> Install-Package NinjaNye.SearchExtensions

The Goal

The thought behind a ranked search is to enable users to easily search there data collections and determine which results are more relevant to others.

How to use it

A ranked search is called in the same way as a regular search:

var result = queryableData.RankedSearch(x => x.Property, "searchTerm");

This produces the following SQL when used with a sql data provider. Notice that all the searching and ranking is done in SQL (not in memory)

SELECT [Project1].[C1] AS [C1], [Project1].[Property] AS [Property] ... FROM ( SELECT [Extent1].[Property] AS [Property], ... (( CAST(LEN([Extent1].[Property]) AS int)) - ( CAST(LEN(REPLACE([Extent1].[Property], N'searchTerm', N'')) AS int))) / 10 AS [C1] FROM [dbo].[Table] AS [Extent1] WHERE [Extent1].[Property] LIKE N'%searchTerm%' ) AS [Project1]

How it was built (Expression Trees)

So here is the implementation. Firstly, to represent my ranked result I have the following interface

public interface IRanked<out T>
{
    int Hits { get; }
    T Item { get; }
}

... with the following concrete class

internal class Ranked<T> : IRanked<T>
{
    public int Hits { get; set; }
    public T Item { get; set; }
}    

The RankedSearch extension method

public static class RankedSearchExtensions
{
    public static IQueryable<IRanked<T>> RankedSearch<T>(this IQueryable<T> source, 
                                            Expression<Func<T, string>> stringProperty, 
                                            string searchTerm)
    {
        var parameterExpression = stringProperty.Parameters[0];
        var hitCountExpression = CalculateHitCount(stringProperty, searchTerm);
        var rankedInitExpression = ConstructRankedResult<T>(hitCountExpression, 
                                                            parameterExpression);

        var selectExpression = 
               Expression.Lambda<Func<T, Ranked<T>>>(rankedInitExpression, parameterExpression);

        return source.Search(stringProperty, searchTerm)
                     .Select(selectExpression);
    }

The first thing this method does is call CalculateHitCount which creates an expression that represents counting the number of times a search term occurs. I am using the following method to count occurrences so that this can be used by all providers, specifically SQL.

Note: Always write down the code you are trying to build to help visualize the expression tree

x => x.Name.Length - x.Name.Replace([searchTerm], "").Length) / [searchTerm].Length;

In terms of building the above as an expression tree, this was accomplished as follows:

private static BinaryExpression CalculateHitCount<T>(Expression<Func<T, string>> stringProperty, 
                                                     string searchTerm)
{
    Expression searchTermExpression = Expression.Constant(searchTerm);

    // Store term length to work out how many search terms were found
    Expression searchTermLengthExpression = Expression.Constant(searchTerm.Length);

    // Empty string expression to replace search terms with
    Expression emptyStringExpression = Expression.Constant("");        
    PropertyInfo stringLengthProperty = typeof (string).GetProperty("Length");

    //Calculate the length of property
    var lengthExpression = Expression.Property(stringProperty.Body, stringLengthProperty);

    // Replace searchTerm with empty string in property                                                     
    MethodInfo replaceMethod = typeof(string).GetMethod("Replace", 
                                                new[] {typeof (string), typeof (string)});
    var replaceExpression = Expression.Call(stringProperty.Body, replaceMethod, 
                                            searchTermExpression, emptyStringExpression);

    // Calculate length of replaced string
    var replacedLengthExpression = Expression.Property(replaceExpression, stringLengthProperty);

    // Calculate the difference between the property and the replaced property
    var charDiffExpression = Expression.Subtract(lengthExpression, replacedLengthExpression);

    // Divide the character difference by the number of characters in the
    // search term to get the amount of occurrences 
    return Expression.Divide(charDiffExpression, searchTermLengthExpression);
}

The second part of a RankSearch is to initialize a Ranked search result holding the hit count as well as returning the original item. We already have the hit count expression using the method above. We now need to build an expression tree that uses the hit count and builds a ranked result.

The equivalent lambda I want to build is as follows:

x => new Ranked<T>{ Hits = [hitCountExpression], Item = x}

This is represented as the following expression tree. It is fairly simple as it is simple initializing our ranked result:

private static Expression ConstructRankedResult<T>(Expression hitCountExpression, 
                                                   ParameterExpression parameterExpression)
{
    var rankedType = typeof (Ranked<T>);
    // Construct the object
    var rankedCtor = Expression.New(rankedType);

    // Assign hitCount to Hits property
    var hitProperty = rankedType.GetProperty("Hits");
    var hitValueAssignment = Expression.Bind(hitProperty, hitCountExpression);

    //Assign record to Item property
    var itemProperty = rankedType.GetProperty("Item");
    var itemValueAssignment = Expression.Bind(itemProperty, parameterExpression);

    // Initialize Ranked object with property assignments
    return Expression.MemberInit(rankedCtor, hitValueAssignment, itemValueAssignment);
}

Get in touch

I'm not entirely happy with the method name RankedSearch as it suggests the result is ordered by default. This is not the case as the user can order the results as they see fit. RankedSearch simply provides an occurrence (hit) count of the search term. If you have a suggestion as to a better method name, please get in touch via the comments below, twitter, or emailing me using the link in the header

I am currently implementing the RankedSearch feature for use with multiple properties and multiple search terms (a future post, no doubt) but if you have any ideas as to future features or enhancements, then, again, please get in touch using the normal channels.

Comments 5

Paul Inglis says: 3164 days ago

Hi John, love the search extensions. The ranked one really saved me a lot of time! I've got one little bug though, wondering if you can help me?

I'm searching over 4 fields: Name, Role, Goal and Reason (all text fields). If I populate each field with the word "testing" and then search for the term "testing" I get an error of :

"The cast to value type 'Int32' failed because the materialized value is null. Either the result type's generic parameter or the query must use a nullable type."

But if I reduce it to "testin" then I get results back fine.

My LINQ query is as follows:

        var results = query.Search(
                             p =&gt; p.Name, 
                             p =&gt; p.Role, 
                             p =&gt; p.Goal,
                             p =&gt; p.Reason).Containing(term.Split(&#39; &#39;))
                      .ToRanked()
                      .OrderByDescending(r =&gt; r.Hits);

When I then evaluate to process the list of items I get the above error.

Thanks,

Paul

John says: 3160 days ago

Hi Paul,

Thanks for getting in touch. Glad to hear you like SearchExtensions. I'll look into your issue this evening as it sounds there ma be a bug somewhere.

Could I possibly ask you to create an issue on the projects GitHub page. I'll begin work immediately but Github issues are better for tracking progress and receiving updates.

I'll be update as soon as I have any findings

Cheers John

John says: 3160 days ago

Hi Paul,

If possible could you also include the stack trace in the issue. Hopefully that will help me identify the route cause of the issue a little quicker. If you are unable to create an issue, just add the details here and I'll create the issue over on github.

Thanks again

Collin says: 2528 days ago

Hi John the search is great and now I discovered ranked search. I'm struggling because all of my other code expects the iQueriable(of MyType) and not iQueriable(of IRanked(of Mytype) ) and I'm having trouble converting back out. I realize the perf hit I might take but my subsets are small.

Thanks, -Collin

John says: 2520 days ago

Hi Colin,

You should be able to do something like

var converted = ranked.Select(r =&gt; r.Item);

Apologies if this is slightly off as it's from memory and typed out on my phone (away from an IDE)

Hope that helps.

Leave a message...

25 Apr
2024