In an earlier post, How to Build an Autocomplete Feature with Elasticsearch, we showed how to build a basic autocomplete that looks for all documents in the index. This feature is good for the generic autocomplete feature, but it is not enough if your index has a lot of product categories, for example. Therefore, in this post we'll explore context-based autocompletion, which will help you implement intelligent filtering based on categories and geo points. Let's get started!
The major limitation of the basic completion suggester is that it looks for all documents in the index. However, you often want to serve suggestions based on some categories or criteria. For example, you may want to suggest film titles filtered by directors or you want to boost films titles based on their genre.
You can add context mapping to the completion field of your index to implement suggestion filtering and boosting. Elasticsearch allows defining multiple contexts with unique names and types. Context types currently supported are category
and geo
.
Examples in this tutorial were tested in the following environment:
In this tutorial, we will implement a context-based autocompletion for the film index. Films in the index will be filtered by genre and director. The first thing we need to do is to create an index mapping with corresponding context suggester settings. That's how our index mapping looks like:
curl -X PUT "localhost:9200/films" -H "Content-Type:application/json" -d ' { "mappings": { "_doc" : { "properties" : { "suggest" : { "type" : "completion", "contexts": [ { "name": "director", "type": "category", "path": "director" }, { "name": "genre", "type": "category", "path": "genre" } ] }, "title": { "type":"keyword" }, "director": { "type": "keyword" }, "genre": { "type": "keyword" } } } } }'
As you see, we've defined two contexts (genre and director) for our film index. Each context has the category
type that allows us to associate one or more categories with suggestions at index time. We've also specified the path
for each context. This field defines the path in the document from which the categories should be read.
Next, let's put some documents in our film index. We've selected a few films in comedy and horror genres to illustrate different contexts:
Document #1
curl -X PUT "localhost:9200/films/_doc/1" -H "Content-Type:application/json" -d ' { "suggest": { "input": ["The Ladykillers", "Burn After Reading","Hail, Caesar!"], "contexts": { "genre": ["comedy"], "director": ["Coen Brothers"] } } }'
Document #2
curl -X PUT "localhost:9200/films/_doc/2" -H "Content-Type:application/json" -d ' { "suggest": { "input": ["The Blues Brothers", "Spies Like Us"], "contexts": { "genre": ["comedy"], "director": ["John Landis"] } } }'
Document #3
curl -X PUT "localhost:9200/films/_doc/3" -H "Content-Type:application/json" -d ' { "suggest": { "input": ["Biloxi Blues", "The Birdcage "], "contexts": { "genre": ["comedy"], "director": ["Mike Nichols"] } } }'
Document #4
curl -X PUT "localhost:9200/films/_doc/4" -H "Content-Type:application/json" -d ' { "suggest": { "input": ["The Texas Chainsaw", "Spontaneous Combustion"], "contexts": { "genre": ["horror"], "director": ["Tobe Hooper"] } } }'
Document #5
curl -X PUT "localhost:9200/films/_doc/5" -H "Content-Type:application/json" -d ' { "suggest": { "input": ["City of the Living Dead", "The Beyond"], "contexts": { "genre": ["horror"], "director": ["Lucio Fulci"] } } }'
Now, let's post a query that autcompletes the search input "Sp" based on the "comedy" context. This query should return all "comedy" films that match the "Sp" in their title.
curl -X POST "localhost:9200/films/_search?pretty" -H "Content-Type:application/json" -d ' { "suggest": { "film_suggestion" : { "prefix" : "Sp", "completion" : { "field" : "suggest", "size": 10, "contexts": { "genre": [ "comedy" ] } } } } }'
Note: _suggest
endpoint has been deprecated in Elasticsearch 5.0 in favor of using suggester with _search
endpoint.
The response to the query above should be something like this:
"film_suggestion": [ { "text": "Sp", "offset": 0, "length": 2, "options": [ { "text": "Spies Like Us", "_index": "films", "_type": "_doc", "_id": "2", "_score": 1, "_source": { "suggest": { "input": [ "The Blues Brothers", "Spies Like Us" ], "contexts": { "genre": [ "comedy" ], "director": [ "John Landis" ] } } }, "contexts": { "genre": [ "comedy" ] } } ] } ] } }
As you see, the response JSON has the options object that contains all matching suggestions. Along with the common fields, the returned suggestions have a _score
field that indicates how high the suggestion ranks in the options list. The query returned only the movie titled Spies Like Us directed by John Landis, although the search prefix "Sp" matches other films in the index (e.g., Spontaneous Combustion ). However, the filtered category is "comedy," so the query returns only films in the comedy category. For example, if we specify two directors as contexts, the query will match the search prefix only against films by those directors:
curl -X POST "localhost:9200/films/_search?pretty" -H "Content-Type:application/json" -d ' { "suggest": { "film_suggestion" : { "prefix" : "The", "completion" : { "field" : "suggest", "size": 10, "contexts": { "director": [ "Coen Brothers", "Mike Nichols"] } } } } }'
Context suggesters also support boosting certain categories higher than others. In the following example, we filter film suggestions both by "horror" and "comedy" categories but boost "comedy" movies matching the query higher.
Note: The boost parameter is a factor by which the score of the suggestion should be boosted. The score is calculated by multiplying the boost with the suggestion weight.
curl -X POST "localhost:9200/films/_search?pretty" -H "Content-Type:application/json" -d ' { "suggest": { "film_suggestion" : { "prefix" : "The", "completion" : { "field" : "suggest", "size": 10, "contexts": { "genre": [ {"context":"horror"}, {"context":"comedy","boost":2} ] } } } } }'
Geo contexts allow associating geo points or geohashes with suggestions at index time. If a geo context is defined, suggestions within a certain distance from a specified geo location can be filtered and boosted. You can set that distance using the precision
parameter. It defines the precision of the geohash to be indexed and can be specified as a distance (5m
, 10km
etc.) or as a raw geohash precision (1
..12
). Precision values and their corresponding distances are described in the table below (e.g., 1 precision = +- 2500 km):
# km 1 ± 2500 2 ± 630 3 ± 78 4 ± 20 5 ± 2.4 6 ± 0.61 7 ± 0.076 8 ± 0.019 9 ± 0.0024 10 ± 0.00060 11 ± 0.000074
So, for example, 4 precision allows for the deviation from the specified location within 20 kilometers.
To illustrate how geo contexts work, let's first define a new mapping:
curl -X PUT "localhost:9200/place" -H "Content-Type:application/json" -d ' { "mappings": { "_doc" : { "properties" : { "suggest" : { "type" : "completion", "contexts": [ { "name": "place_type", "type": "category" }, { "name": "location", "type": "geo", "precision": 4 } ] } } } } }'
Next, let's index some documents.
Document #1
curl -X PUT "localhost:9200/place/_doc/1" -H "Content-Type:application/json" -d ' { "suggest": { "input": "starbucks", "contexts": { "location": [ { "lat": 42.4494803, "lon": -79.3863353 } ] } } }'
Document #2
curl -X PUT "localhost:9200/place/_doc/2" -H "Content-Type:application/json" -d ' { "suggest": { "input": "starbucks", "contexts": { "location": [ { "lat": 42.5594803, "lon": -79.4863353 } ] } } }'
Document #3
curl -X PUT "localhost:9200/place/_doc/3" -H "Content-Type:application/json" -d ' { "suggest": { "input": "starbucks", "contexts": { "location": [ { "lat": 45.8594803, "lon": -80.4863353 } ] } } }'
Let's verify that context-based autocompletion is working:
curl -X POST "localhost:9200/place/_search?pretty" -H "Content-Type:application/json" -d ' { "suggest": { "place_suggestion" : { "prefix" : "starbucks", "completion" : { "field" : "suggest", "size": 10, "contexts": { "location": { "lat": 42.5494803, "lon": -79.5863353 } } } } } }'
And the response should contain two options because the third location is too far from the one specified in the query (recall that we used precision 4 that is roughly 20 km):
"suggest" : { "place_suggestion" : [ { "text" : "starbucks", "offset" : 0, "length" : 9, "options" : [ { "text" : "starbucks", "_index" : "place", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "suggest" : { "input" : "starbucks", "contexts" : { "location" : [ { "lat" : 42.4494803, "lon" : -79.3863353 } ] } } }, "contexts" : { "location" : [ "dpx9" ] } }, { "text" : "starbucks", "_index" : "place", "_type" : "_doc", "_id" : "2", "_score" : 1.0, "_source" : { "suggest" : { "input" : "starbucks", "contexts" : { "location" : [ { "lat" : 42.5594803, "lon" : -79.4863353 } ] } } }, "contexts" : { "location" : [ "dpx6" ] } } ] } ] } }
All things considered, Elasticsearch context suggesters can significantly enhance autocomplete functionality of your applications. You can filter searches by categories, enriching your users' search experience, returning more relevant results, and decreasing the search time of your indices. As a bonus, geo context feature provided out of the box makes Elasticsearch a great solution for location-based and geo data apps in which results should be filtered by location.
Stay tuned for our next tutorials to learn more about other tools and solutions that can be implemented using Elasticsearch Search API.