# Calculating string difference values with the Levenshtein fuzzy string algorithm (part 2: implementation in t-sql).

In this second post i will write about the Levenshtein algorithm and how to implement it in t-sql (in part 1 i have explained the implementation in c#). With this algorithm it is possible to compare two strings and calculate the a difference value of them. Lets asume that we have a table in a database with words in it.

Now if the user overtakes a word we want to suggest the words out of the table words that is the closest. So we need the Levenshtein algorithm to calculate the levenshtein difference between the overtaken word and all words in the table words. Then we take the words with the minimal levenshtein distance and give it the user to choose out of that words if he wants. For example if the user overtakes ‘lay’ the words with the minimal levenshtein distance are ‘lady’ and ‘lacy’. So this are the words we would suggest to the user. To acomplish that we need a function that overtakes the source and the target string and calculates the levenshtein distance.

```CREATE FUNCTION dbo.CalculateLevenshteinValue(@source nvarchar(max), @target nvarchar(max))
RETURNS int
BEGIN
declare @sourceLength int, @targetLength int

set @sourceLength = len(@source);
set @targetLength = len(@target);

declare @array table ( i int, j int, value int);
declare @i int, @j int, @result int

set @i = 0;
set @j = 0;

WHILE @i <= @sourceLength
BEGIN
insert @array( i, j, [value])
values( @i, 0, @i)
SET @i = @i + 1;
END
WHILE @j <= @targetLength
BEGIN
insert @array ( i, j, [value])
values ( 0, @j, @j)
SET @j = @j + 1;
END

set @i = 1;
set @j = 1;

declare @equal int, @minInsert int, @minDelete int,
declare @minInterchange int, @equalValue int, @insertValue int
declare @sourceChar nvarchar(1), @targetChar nvarchar(1)

WHILE @i <= @sourceLength
BEGIN
WHILE @j <= @targetLength
BEGIN
SET @sourceChar = substring(@source, @i, 1);
SET @targetChar = substring(@target, @j, 1);
SET @equal = 2;
if(@sourceChar = @targetChar)
begin
SET @equal = 0;
end
select @minInsert = value + 1
from @array a where a.i = @i - 1 and a.j = @j
select @minDelete = value + 1
from @array a
where a.i = @i and a.j = @j - 1
select @minInterchange = [value] + @equal
from @array a
where a.i = @i - 1 and a.j = @j - 1
select @insertValue =
dbo.ScalarMinValue(dbo.ScalarMinValue(@minInsert, @minDelete), @minInterchange)
insert @array ( i, j, [value])
values ( @i, @j, @insertValue)
SET @j = @j + 1;
END
SET @i = @i + 1;
SET @j = 1
END
select @result = [value]
from @array a
where a.i = @sourceLength and a.j = @targetLength
RETURN @result;
END;
CREATE FUNCTION dbo.ScalarMinValue
(@Value1 int, @Value2 int)
RETURNS int
BEGIN
declare @return int
Select @return = Case When @Value1 < @Value2 Then @Value1 Else @Value2 End
return @return
END```

The calculation is very simmular to the calculation in c# but because the sql server does not support arrays i am using a temptable to store the calculated values. With this function that calculates the difference between two words we can write the stored procedure that takes the source string and select the target strings out of the Words table with the smallest levenshtein distance.

```CREATE procedure LevenshteinCalculator
@source nvarchar(256)
as
select dbo.CalculateLevenshteinValue(@source,Word) Levenshtein, Word
from Words wout
where dbo.CalculateLevenshteinValue(@source,Word) = (
select MIN(m.Levenshtein)
from
(select dbo.CalculateLevenshteinValue(@source,Word) Levenshtein, Word
from Words w) m)
go```

So if we execute the CalculateLevenshteinValue procedure with ‘lay’ we get the following result.

`exec LevenshteinCalculator @source='lay'`

So lacy and lady are very close to lay and so we could assume that the user has made typing error and suggest him the two words .
(part1) (part2)

### 3 Comments on “Calculating string difference values with the Levenshtein fuzzy string algorithm (part 2: implementation in t-sql).”

1. Hi! thanks ver much for the sql example. I’m working in an normalize street address application and using Levenshtein work better because I can normalize almost any street address. I’ve just one problem, it’s a very slowly query. My “words” table has 100.000 records, so I search my source word, looking for minimum Levenshtein value and this is realy slowly. Could yo tell me if you’ve any idea for get some solution about that?

• netmatze says:

Hello Patricio,

I know the performace problem you have, i would try to split up the words table (for example in tables AWords, BWords, CWords … ZWords)
and change the stored procedure so that it takes the first letter and only tries to find the words in the corresponding table.
I know that that is no very good solution but i have no other i am sorry.

Greetings
netmatze

2. capgem834588 says:

Reblogged this on Capgem testing.

This site uses Akismet to reduce spam. Learn how your comment data is processed.