Good indexes are vital to get good performance on your N1QL queries. As in SQL, but slightly different, you need to index what you are going to search for. In this article, we are going to go through how to create indexes that will lightning quick and won’t leave you hanging as you scale up.
Make sure to check out the other articles in the series as well:
- NoSQL with Couchbase: Setting everything up with Docker
- NoSQL with Couchbase: Getting started
- NoSQL with Couchbase: Working with JSON
- NoSQL with Couchbase: Querying with N1QL
- NoSQL with Couchbase: Creating Indexes that Scale
Indexes have changed slightly from Couchbase version 4.x to 5.x, we are going to focus on version 5.x and onwards. In the latest article, we created a primary index to quickly get started. The primary index is what you might be familiar with in SQL when querying non-primary keys. It lets you query any fields that you want, but it is not quick, think of it as a full-table scan. Let’s fix that.
An error that many users do while creating index is that they create separate indexes for each field that they want to, this is going to trigger an IntersectScan which is not what you are looking for. How do you know if it’s an IntersectScan? In Couchbase 5.0 you press the explain button while issuing a query.
As we can see in the picture, it uses two different indexes and intersects them in order to obtain the result. Instead, if we query for exactly the fields that are in the index we perform something called an IndexScan, or IndexScan2 as it’s named in Couchbase for some reason.
This is what we are looking for in order to have a good and performant index that will scale. When an index contains all the fields that we query for, no extra steps are required to fetch the data. When we achieve that criteria, we have what is called a covering index for that specific query, which is what you should aim for if it’s important that the query is as quick as possible.
The query below has a covering index that supports it.
CREATE INDEX `person_company`
ON `thecuriousdev-demo`(`name`, `company`)
WHERE type = "tcd:person";
WHERE company = "starbucks"
AND type = "tcd:person";
Of course, sometimes it’s not very effective to run queries with perfect covering indexes, what if we want to return the whole document? There is no way that we are going to index every field in the document if we are looking for a good performance. It’s definitely fine to run a SELECT * and removing the name from the index. The only gain in having a perfect covering index is that the indexer doesn’t need to go and ask the data service for the full document, the loss isn’t big at all. But if you can have a covering index without indexing too many extra fields just for the SELECT statement, then definitely, go for it!
Sometimes performance isn’t the most important factor though, having a covering index on every query in your system can be a pain to maintain, especially with Memory-optimized indexes and a relatively young system where new fields are added that should be possible to query. For the business critical queries speed is crucial, but for other stuff such as inventory queries and such, perhaps it is not necessary, and instead, you could save yourself some trouble.
Adaptive indexes are new in Couchbase 5.0 and quite good for this exact use-case. The primary index gave us the opportunity to run ad-hoc queries, but as primary index keeps every type of document in the index it is going to be quite costly and slow. With an adaptive index, we can instead choose to index N number of fields in a document so that we can query a lot of fields in it.
Let’s say that we want to be able to make it possible to query all fields in our Person documents. A person has a name, an age, and a favourite food. An adaptive index for it could look like the following.
CREATE INDEX `person_adaptive`
ON `thecuriousdev-demo`(DISTINCT PAIRS(self))
WHERE type = "tcd:person"
This will cause all documents of type tcd:person to have all their fields in the index. Which means we can run any ad-hoc query on the document, a couple of examples follows.
// List persons above 18 years old
WHERE age > 18
AND type = "tcd:person"
// Check how many people that like tacos in each company
SELECT company, count(*)
WHERE favouriteFood = "tacos"
AND type = "tcd:person"
GROUP BY company
// Check how many people that are under 18 yeards old with the name Viktor
SELECT name, count(*)
WHERE age < 18
AND type = "tcd:person"
GROUP BY name
All of these queries are using the same adaptive index. Please do keep in mind though, that adaptive can consume a lot of memory since all fields are being indexed in the documents in the index, but they are definitely cheaper and faster than having a primary index. So if you are looking to run ad-hoc queries, they should be your go-to index type.
However, even though they seem useful, for most use-cases I would not recommend using them. At least not if you are going to have more than a couple million documents in these indexes. Striving for covering index is going to be a lot quicker and save yourself a significant amount of memory. If you would like to support these ad-hoc queries anyway even after my warning. I highly recommend that you create the index with proper WHERE clauses that will filter out as many documents as possible.
We have gone through how to create good indexes that will make it possible for your application to scale. Queries on these indexes are actually impressively fast on the latest Couchbase 5.0 version. If you tried a previous Couchbase version and thought it was quick already then, you will probably be shocked how quick it is now. And this is one of the main reasons why I am such a big fan of Couchbase and document-based databases. You get the incredible speed of key-value fetches, plus, Couchbase lets you run the same powerful queries as you are used to in the SQL world.