![Logo](/icon.png)
gittech. site
for different kinds of informations and explorations.
An early-exit Levenshtein implementation
Early-Exit Levenshtein
![PkgGoDev](https://pkg.go.dev/badge/github.com/eaxis/levenshtein)
A Go package to calculate the Levenshtein Distance with optional early-exit optimization for comparing large texts for similarity.
This library is based on agnivade's implementation, and works the same way if no threshold is provided.
When a threshold is set, the library stops calculating the distance as soon as the distance exceeds the threshold and returns threshold + 1
instead of calculating the remaining distance, saving significant CPU time when comparing long strings.
The library also nests the following features/limitations:
- The library is fully capable of working with non-ascii strings. But the strings are not normalized.
- As a performance optimization, the library can handle strings only up to 65536 characters (runes).
Motivation
I created this library because I needed to compare thousands of posts for duplicates in my side project. The process was disappointingly slow and only got worse as more posts were added. By using an early-exit approach, Iโve achieved over 100x speedup in my certain case. You can find detailed benchmarks below.
Install
go get github.com/eaxis/levenshtein
A simple example
package main
import (
"fmt"
"github.com/eaxis/levenshtein"
)
func main() {
s1 := "kitten"
s2 := "sitting"
distance := levenshtein.ComputeDistance(s1, s2)
fmt.Printf("The distance is %d.\n", distance) // The distance is 3.
}
An example with early-exit optimization
package main
import (
"fmt"
"github.com/eaxis/levenshtein"
)
func main() {
similarityThreshold := 10
// The Levenstein distance between these strings is 47.
// Since the similarityThreshold is 10, the function will stop calculating the distance at 10 and return 11.
// Which means the distance is greater than the similarityThreshold.
s1 := "these strings are completely different and have nothing in common"
s2 := "calculating the full distance is just a waste of time"
distance := levenshtein.ComputeDistance(s1, s2, similarityThreshold)
fmt.Printf("The distance is at least %d.\n", distance) // The distance is at least 11.
if distance <= similarityThreshold {
fmt.Println("The strings are similar.")
} else {
fmt.Println("The strings are not similar.") // this will be printed
}
}
Benchmarks
Comparisons with other libraries (short strings with threshold = 2)
BenchmarkCompetitorsWithThreshold/ASCII_short/eaxis-12 ~61 ns/op
BenchmarkCompetitorsWithThreshold/ASCII_short/agniva-12 ~90 ns/op
BenchmarkCompetitorsWithThreshold/ASCII_short/arbovm-12 ~221 ns/op
BenchmarkCompetitorsWithThreshold/ASCII_short/dgryski-12 ~220 ns/op
Comparisons with other libraries (long strings with threshold = 10)
BenchmarkCompetitorsWithThreshold/ASCII_long/eaxis-12 ~1995 ns/op
BenchmarkCompetitorsWithThreshold/ASCII_long/agniva-12 ~634766 ns/op
BenchmarkCompetitorsWithThreshold/ASCII_long/arbovm-12 ~917835 ns/op
BenchmarkCompetitorsWithThreshold/ASCII_long/dgryski-12 ~921690 ns/op
Comparisons with other libraries (short strings)
BenchmarkCompetitors/ASCII_short/eaxis-12 ~111 ns/op
BenchmarkCompetitors/ASCII_short/agniva-12 ~91 ns/op
BenchmarkCompetitors/ASCII_short/arbovm-12 ~219 ns/op
BenchmarkCompetitors/ASCII_short/dgryski-12 ~223 ns/op
Comparisons with other libraries (long strings)
BenchmarkCompetitors/ASCII_long/eaxis-12 ~726370 ns/op
BenchmarkCompetitors/ASCII_long/agniva-12 ~633286 ns/op
BenchmarkCompetitors/ASCII_long/arbovm-12 ~900986 ns/op
BenchmarkCompetitors/ASCII_long/dgryski-12 ~912527 ns/op
Earn $100 Fast: AI + Notion Templates
Do you want to make extra money quickly? This guide shows you how to create and sell Notion templates step by step. Perfect for beginners or anyone looking for an easy way to start earning online.
Why Download This Guide?
- Start Making Money Fast: Follow a simple process to create templates people want and will buy.
- Save Time with AI: Learn to use tools like ChatGPT to design and improve templates.
- Join a Growing Market: More people are using Notion every day, and they need templates to save time and stay organized.
Includes Helpful Tools:
- ChatGPT Prompts PDF: Ready-made prompts to spark ideas and create templates faster.
- Checklist PDF: Stay on track as you work.
Whatโs Inside?
- Clear Steps to Follow: Learn everything from idea to sale.
- How to Find Popular Ideas: Research trends and needs.
- Using AI to Create: Tips for improving templates with AI tools.
- Making Templates User-Friendly: Simple tips for better design.
- Selling Your Templates: Advice on sharing and selling on platforms like Gumroad or Etsy.
- Fixing Common Problems: Solutions for issues like low sales or tricky designs.
Who Is This For?
- Anyone who wants to make extra money online.
- People who love using Notion and want to share their ideas.
- Creators looking for a simple way to start selling digital products.
Get your copy now and start making money today!
๐ฐ Want to Earn 40% Commission?
Join our affiliate program and start making money by promoting well crafter prodicts! Earn 40% on every sale you refer.
๐ Sign up as an affiliate here: Become an Affiliate