Search result highlighting and truncation in C#

By · February 3, 2016 · 0 comments

Isn't it nice how search engines show a brief section of content with your search terms highlighted in search results?

I wanted to create something similar for a website I am working on.

It needs to handle highlighting of individual keywords, or highlighting of a search phrase.

If there are multiple matches, it needs to show a small section of the content for each match.

It gets a little complicated when there might be multiple matches for parts of the same word. For example, a search for "and" and "sand" will both match part of the word "sandwich".

Generating snippets of content is alsocomplicated when there might be multiple matches close together. You don't want to repeat any content in multiple snippets, so this system needs to handle close together matches.

Based on these requirements I came up with the following class and methods. There are two overloads of the HitHighlighter method, one for phrase matching and one for individual keywords.

They take the following parameters:

content: a string with the text that will have the hits highlighted. This string must not contain HTML, and should not contain line breaks.
matchPhrase: a boolean value to indicate when phrase matches shoudl be highlighted rather than individual keywords
phrase: the search phrase
keywords: a string array of keywords (if matchPhrase is true, this must contain one value)

There are a couple of constants in there which can be changed to suit your requirements:

length: this is the maximum amount of characters before and after each match that will be included in the snippet of content around the match
truncateText: the text that will be inserted when content has been truncated
highlightStartTag: what will be inserted into the content to indicate the start of a match (by default is HTML bold tag)
highlightEndTag: what will be inserted into the content to indicate the end of a match

public class SearchHelpers
{
public static string HitHighlighter(string content, string phrase)
{
return HitHighlighter(content, new string[] { phrase }, true);
}


public static string HitHighlighter(string content, string[] keywords, bool matchPhrase = false)
{
const int length = 100;
const string truncateText = "...";
const string highlightStartTag = "";
const string highlightEndTag = "
";

List matches = new List();

if (!matchPhrase)
{
foreach (string keyword in keywords)
{
int start = content.IndexOf(keyword, 0, StringComparison.OrdinalIgnoreCase);

while (start != -1)
{
int end = start + keyword.Length;

bool exists = false;
foreach (HighlightMatch match in matches)
{
if (match.StartPos > start && match.StartPos < end && match.EndPos > start && match.EndPos < end)
{
exists = true;

match.StartPos = start;
match.EndPos = end;
}

if (start >= match.StartPos && start <= match.EndPos)
{
exists = true;

if (end > match.EndPos)
{
match.EndPos = end;
}
}

if (end >= match.StartPos && end <= match.EndPos)
{
exists = true;

if (start < match.StartPos)
{
match.StartPos = start;
}
}
}

if (!exists)
{
matches.Add(new HighlightMatch(start, end));
}

start = content.IndexOf(keyword, start + 1, StringComparison.OrdinalIgnoreCase);
}
}
}
else if (matchPhrase)
{
int start = content.IndexOf(keywords[0], 0, StringComparison.OrdinalIgnoreCase);

while (start != -1)
{
matches.Add(new HighlightMatch(start, start + keywords[0].Length));

start = content.IndexOf(keywords[0], start + 1, StringComparison.OrdinalIgnoreCase);
}
}

matches = matches.OrderBy(a => a.StartPos).ToList();

string result = content;

//this truncates to only show relevant part of content


int decrement = 0;

if (matches.Any())
{
if (matches[0].StartPos > length)
{
int start = result.LastIndexOf(".", matches[0].StartPos);

if (start != -1 && start <= matches[0].StartPos - (length + 1))
{
start = matches[0].StartPos - (length + 1);

while (start < matches[0].StartPos)
{
if (result.Substring(start, 1) == " ")
{
start++;
break;
}

start++;
}
}
else
{
start++;

if (result.Substring(start, 1) == " ")
{
start++;
}
}

result = truncateText + result.Substring(start);

decrement = start - truncateText.Length;
}

for (int i = 1; i < matches.Count; i++)
{
if (((matches[i].StartPos - decrement) - (matches[i - 1].EndPos - decrement)) > (length * 2))
{
int start = result.IndexOf(".", matches[i - 1].EndPos - decrement);

if (start != -1 && start >= (matches[i].StartPos - decrement) - (length + 1))
{
start = (matches[i].StartPos - decrement) - (length + 1);

while (start < matches[0].StartPos)
{
if (result.Substring(start, 1) == " ")
{
start++;
break;
}

start++;
}
}
else
{
start++;

if (result.Substring(start, 1) == " ")
{
start++;
}
}

int end = (matches[i].StartPos - decrement) - (length + 1);

while (end < (matches[i].StartPos - decrement))
{
if (result.Substring(end, 1) == " ")
{
end++;
break;
}

end++;
}

result = result.Substring(0, start) + truncateText + result.Substring(end);

decrement += (end - start) - truncateText.Length;
}
}

//if (result.Length - (matches[matches.Count - 1].EndPos - decrement) > length)
{
int start = result.IndexOf(".", matches[matches.Count - 1].EndPos - decrement);

if (start == -1 || (start != -1 && start >= (matches[matches.Count - 1].EndPos - decrement) + (length + 1)))
{
start = (result.Length - (matches[matches.Count - 1].EndPos - decrement) > length) ? (matches[matches.Count - 1].EndPos - decrement) + (length + 1) : result.Length - 0;

while (start > (matches[matches.Count - 1].EndPos - decrement))
{
if (result.Substring(start, 1) == " ")
{
break;
}

start--;
}
}

result = result.Substring(0, start) + truncateText;
}
}

List matches2 = new List();

if (!matchPhrase)
{
foreach (string keyword in keywords)
{
int start = result.IndexOf(keyword, 0, StringComparison.OrdinalIgnoreCase);

while (start != -1)
{
int end = start + keyword.Length;

bool exists = false;
foreach (HighlightMatch match in matches2)
{
if (match.StartPos > start && match.StartPos < end && match.EndPos > start && match.EndPos < end)
{
exists = true;

match.StartPos = start;
match.EndPos = end;
}

if (start >= match.StartPos && start <= match.EndPos)
{
exists = true;

if (end > match.EndPos)
{
match.EndPos = end;
}
}

if (end >= match.StartPos && end <= match.EndPos)
{
exists = true;

if (start < match.StartPos)
{
match.StartPos = start;
}
}
}

if (!exists)
{
matches2.Add(new HighlightMatch(start, end));
}

start = result.IndexOf(keyword, start + 1, StringComparison.OrdinalIgnoreCase);
}
}
}
else if (matchPhrase)
{
int start = result.IndexOf(keywords[0], 0, StringComparison.OrdinalIgnoreCase);

while (start != -1)
{
matches2.Add(new HighlightMatch(start, start + keywords[0].Length));

start = result.IndexOf(keywords[0], start + 1, StringComparison.OrdinalIgnoreCase);
}
}

matches2 = matches2.OrderBy(a => a.StartPos).ToList();



int increment = 0;

foreach (HighlightMatch match in matches2)
{
result = result.Substring(0, match.StartPos + increment) + highlightStartTag + result.Substring(match.StartPos + increment, match.EndPos - match.StartPos) + highlightEndTag + result.Substring(match.EndPos + increment);
increment += highlightStartTag.Length + highlightEndTag.Length;
}

return result;
}

protected class HighlightMatch
{
public HighlightMatch(int startPos, int endPos)
{
this.StartPos = startPos;
this.EndPos = endPos;
}

public int StartPos { get; set; }
public int EndPos { get; set; }
}
}

Example using some bible verse:

string content = @"In the beginning God created the heavens and the earth. Now the earth was formless and empty, darkness was over the surface of the deep, and the Spirit of God was hovering over the waters. And God said, ""Let there be light,"" and there was light. God saw that the light was good, and he separated the light from the darkness. God called the light ""day,"" and the darkness he called ""night."" And there was evening, and there was morning-the first day. And God said, ""Let there be a vault between the waters to separate water from water."" So God made the vault and separated the water under the vault from the water above it. And it was so. God called the vault ""sky."" And there was evening, and there was morning-the second day. And God said, ""Let the water under the sky be gathered to one place, and let dry ground appear."" And it was so. 10 God called the dry ground ""land,"" and the gathered waters he called ""seas."" And God saw that it was good. Then God said, ""Let the land produce vegetation: seed-bearing plants and trees on the land that bear fruit with seed in it, according to their various kinds."" And it was so.  The land produced vegetation: plants bearing seed according to their kinds and trees bearing fruit with seed in it according to their kinds. And God saw that it was good. And there was evening, and there was morning-the third day. And God said, ""Let there be lights in the vault of the sky to separate the day from the night, and let them serve as signs to mark sacred times, and days and years, and let them be lights in the vault of the sky to give light on the earth."" And it was so. God made two great lights-the greater light to govern the day and the lesser light to govern the night. He also made the stars. God set them in the vault of the sky to give light on the earth, to govern the day and the night, and to separate light from darkness. And God saw that it was good. And there was evening, and there was morning-the fourth day. And God said, ""Let the water teem with living creatures, and let birds fly above the earth across the vault of the sky."" So God created the great creatures of the sea and every living thing with which the water teems and that moves about in it, according to their kinds, and every winged bird according to its kind. And God saw that it was good. God blessed them and said, ""Be fruitful and increase in number and fill the water in the seas, and let the birds increase on the earth."" And there was evening, and there was morning-the fifth day. And God said, ""Let the land produce living creatures according to their kinds: the livestock, the creatures that move along the ground, and the wild animals, each according to its kind."" And it was so. God made the wild animals according to their kinds, the livestock according to their kinds, and all the creatures that move along the ground according to their kinds. And God saw that it was good. Then God said, ""Let us make mankind in our image, in our likeness, so that they may rule over the fish in the sea and the birds in the sky, over the livestock and all the wild animals, and over all the creatures that move along the ground."" So God created mankind in his own image, in the image of God he created them; male and female he created them. God blessed them and said to them, ""Be fruitful and increase in number; fill the earth and subdue it. Rule over the fish in the sea and the birds in the sky and over every living creature that moves on the ground."" Then God said, ""I give you every seed-bearing plant on the face of the whole earth and every tree that has fruit with seed in it. They will be yours for food. And to all the beasts of the earth and all the birds in the sky and all the creatures that move along the ground-everything that has the breath of life in it-I give every green plant for food."" And it was so. God saw all that he had made, and it was very good. And there was evening, and there was morning-the sixth day.";

Response.Write(SearchHelpers.HitHighlighter(content, new string[] { "evening", "morning" }));

This produces this result:

Result of search result hit highlighting

UPDATE 13 April 2017: I've created an example system for how this can work. See the project on GitHub.
Read more...
C#

Get the latest posts delivered to your inbox.

Comments

There are no comments yet. Be the first to leave a comment!

Leave a Comment

All comments are moderated and rel="nofollow" is in use. Avatars are sourced from gravatar.com – a globally recognised avatar.

Type the numbers from the picture above

About me
John Avis ...mostly about web development and programming, with a little bit of anything else related to the Internet, computers and technology.

profile for John at Stack Overflow, Q&A for professional and enthusiast programmers
Subscribe

Get the latest posts delivered to your inbox. *