A simple in-memory search for collections (Vec, HashMap, BTreeMap, etc.)
and key-value stores. Features autocompletion and fuzzy matching.
There are many incredible search engines available for Rust. Many seem to
require compiling a separate server binary. I wanted something simple and
light-weight - an easy-to-use crate that could conveniently search structs and
collections within my own binary. So, I made indicium.
While indicium was made with web apps in mind, it is an in-memory search and
it does not scale indefinitely or to cloud size (i.e. Facebook or Google size).
Even in such an environment, it would still be a convenient way of searching
large lists (such as currencies, languages, countries, etc.) It's also great for
applications where there is an anticipated scale limit (i.e. searching a list of
company assets, list of users in a corporate intranet, etc.)
Indicium easily can handle millions of records without breaking a sweat thanks to Rust's BTreeMap. This crate is primarily limited by available memory. However, depending on the nature your data-set and if there are keywords that are repeated many times, performance may begin to degrade at a point.
Configure the dependencies in your project's Cargo.toml file:
[dependencies]
indicium = "0.6"-
Release notes are available on GitHub.
-
The full change log is available on GitHub.
For our Quick Start Guide example, we will be searching inside of the
following struct:
struct MyStruct {
title: String,
year: u16,
body: String,
}To begin, we must make our record indexable. We'll do this by implementing the
Indexable trait for our struct. The idea is to return a String for every
field that we would like to be indexed. Example:
use indicium::simple::Indexable;
impl Indexable for MyStruct {
fn strings(&self) -> Vec<String> {
vec![
self.title.clone(),
self.year.to_string(),
self.body.clone(),
]
}
}Don't forget that you may make numbers, numeric identifiers, enums, and other
types in your struct (or other complex types) indexable by converting them to
a String and including them in the returned Vec<String>.
To index an existing collection, we can iterate over the collection. For each record, we will insert it into the search index. This should look something like these two examples:
use indicium::simple::SearchIndex;
let my_vec: Vec<MyStruct> = Vec::new();
// In the case of a `Vec` collection, we use the index as our key. A
// `Vec` index is a `usize` type. Therefore we will instantiate
// `SearchIndex` as `SearchIndex<usize>`.
let mut search_index: SearchIndex<usize> = SearchIndex::default();
my_vec
.iter()
.enumerate()
.for_each(|(index, element)|
search_index.insert(&index, element)
);use std::collections::HashMap;
use indicium::simple::SearchIndex;
let my_hash_map: HashMap<String, MyStruct> = HashMap::new();
// In the case of a `HashMap` collection, we use the hash map's key as
// the `SearchIndex` key. In our hypothetical example, we will use
// MyStruct's `title` as a the key which is a `String` type. Therefore
// we will instantiate `HashMap<K, V>` as HashMap<String, MyStruct> and
// `SearchIndex<K>` as `SearchIndex<String>`.
let mut search_index: SearchIndex<String> = SearchIndex::default();
my_hash_map
.iter()
.for_each(|(key, value)|
search_index.insert(key, value)
);As long as the Indexable trait was implemented for your value type, the above
examples will index a previously populated Vec or HashMap. However, the
preferred method for large collections is to insert into the SearchIndex as
you insert into your collection (Vec, HashMap, etc.)
It's recommended to wrap your target collection (your Vec, HashMap, etc.)
and this SearchIndex together in a new struct type. Then, implement the
insert, replace, remove, etc. methods for this new struct type that will
update both the collection and search index. This will ensure that both your
collection and index are always synchronized.
Once the index has been populated, you can use the search and autocomplete
methods.
The search method will return keys as the search results. Each resulting
key can then be used to retrieve the full record from its collection.
Basic usage:
let mut search_index: SearchIndex<usize> = SearchIndex::default();
search_index.insert(&0, &"Harold Godwinson");
search_index.insert(&1, &"Edgar Ætheling");
search_index.insert(&2, &"William the Conqueror");
search_index.insert(&3, &"William Rufus");
search_index.insert(&4, &"Henry Beauclerc");
let resulting_keys: Vec<&usize> = search_index.search("William");
assert_eq!(resulting_keys, vec![&2, &3]);
// Demonstrating fuzzy matching:
let resulting_keys: Vec<&usize> = search_index.search("Harry");
assert_eq!(resulting_keys, vec![&0]);Search only supports exact keyword matches. For Live searches, fuzzy matching
is only applied to the last keyword. Consider providing the autocomplete
feature to your users to help them build their search as they type.
The autocomplete method will provide several autocompletion options for the
last keyword in the supplied string.
Basic usage:
let mut search_index: SearchIndex<usize> =
SearchIndexBuilder::default()
.autocomplete_type(&AutocompleteType::Global)
.build();
search_index.insert(&0, &"apple");
search_index.insert(&1, &"ball");
search_index.insert(&3, &"bird");
search_index.insert(&4, &"birthday");
search_index.insert(&5, &"red");
let autocomplete_options: Vec<String> =
search_index.autocomplete("a very big bi");
assert_eq!(
autocomplete_options,
vec!["a very big bird", "a very big birthday"]
);
// Demonstrating fuzzy matching:
let autocomplete_options: Vec<String> =
search_index.autocomplete("a very big birf");
assert_eq!(
autocomplete_options,
vec!["a very big bird", "a very big birthday"]
);With the exception of SearchType::Or which uses relevance scoring, search
results are returned in the order defined by your key type's Ord
implementation. This means you can control result ordering by using a custom key
type instead of simple integers or strings.
For example, if you want popular items to appear first in search results, you can create a key that sorts by popularity:
use std::cmp::Ordering;
use std::hash::{Hash, Hasher};
/// A search key that sorts results by popularity (highest first).
#[derive(Debug, Clone)]
pub struct RankedKey {
pub id: String,
pub popularity: u32,
}
impl Ord for RankedKey {
fn cmp(&self, other: &Self) -> Ordering {
// Higher popularity comes first
other.popularity.cmp(&self.popularity)
}
}
impl PartialOrd for RankedKey {
fn partial_cmp(&self, other: &Self) -> Option {
Some(self.cmp(other))
}
}
// For equality and lookups, we only compare the id, not popularity. This lets
// us find records in a HashMap using just the id.
impl PartialEq for RankedKey {
fn eq(&self, other: &Self) -> bool {
self.id == other.id
}
}
impl Eq for RankedKey {}
impl Hash for RankedKey {
fn hash(&self, state: &mut H) {
self.id.hash(state);
}
}Now when you search, results automatically come back sorted by popularity. No extra sorting step needed.
This pattern works for any ranking signal: recency, relevance scores, manual
sort order, etc. Just change what Ord compares.
After searching, you'll typically fetch the full record from your collection:
let results = search_index.search("dawn");
for key in results {
if let Some(record) = my_records.get(key) {
println!("{:?}", record);
}
}Because the Eq and Hash traits that we implemented in the example only look
at the id field (not popularity), you can still use search result keys
directly for HashMap lookups.
This crate is passively maintained. This crate does what it's expected to do and does it pretty well, in my opinion. Frequent updates are not expected.
