THE CURIOUS DEVELOPER

Categories: database

NoSQL with Couchbase: Creating Indexes that Scale

Good indexes are vital to get good performance on your N1QL queries. As in SQL, but slightly different, you need to index what you are going to search for. In this article, we are going to go through how to create indexes that will lightning quick and won’t leave you hanging as you scale up.

Make sure to check out the other articles in the series as well:

NoSQL with Couchbase: Setting everything up with Docker
NoSQL with Couchbase: Getting started
NoSQL with Couchbase: Working with JSON
NoSQL with Couchbase: Querying with N1QL
NoSQL with Couchbase: Creating Indexes that Scale

Indexes have changed slightly from Couchbase version 4.x to 5.x, we are going to focus on version 5.x and onwards. In the latest article, we created a primary index to quickly get started. The primary index is what you might be familiar with in SQL when querying non-primary keys. It lets you query any fields that you want, but it is not quick, think of it as a full-table scan. Let’s fix that.

Covering Indexes

An error that many users do while creating index is that they create separate indexes for each field that they want to, this is going to trigger an IntersectScan which is not what you are looking for. How do you know if it’s an IntersectScan? In Couchbase 5.0 you press the explain button while issuing a query.

As we can see in the picture, it uses two different indexes and intersects them in order to obtain the result. Instead, if we query for exactly the fields that are in the index we perform something called an IndexScan, or IndexScan2 as it’s named in Couchbase for some reason.

This is what we are looking for in order to have a good and performant index that will scale. When an index contains all the fields that we query for, no extra steps are required to fetch the data. When we achieve that criteria, we have what is called a covering index for that specific query, which is what you should aim for if it’s important that the query is as quick as possible.

The query below has a covering index that supports it.


CREATE INDEX `person_company` 
ON `thecuriousdev-demo`(`name`, `company`) 
WHERE type = "tcd:person";

SELECT name 
FROM `thecuriousdev-demo` 
WHERE company = "starbucks" 
AND type = "tcd:person";

Of course, sometimes it’s not very effective to run queries with perfect covering indexes, what if we want to return the whole document? There is no way that we are going to index every field in the document if we are looking for a good performance. It’s definitely fine to run a SELECT * and removing the name from the index. The only gain in having a perfect covering index is that the indexer doesn’t need to go and ask the data service for the full document, the loss isn’t big at all. But if you can have a covering index without indexing too many extra fields just for the SELECT statement, then definitely, go for it!

Adaptive Indexes

Sometimes performance isn’t the most important factor though, having a covering index on every query in your system can be a pain to maintain, especially with Memory-optimized indexes and a relatively young system where new fields are added that should be possible to query. For the business critical queries speed is crucial, but for other stuff such as inventory queries and such, perhaps it is not necessary, and instead, you could save yourself some trouble.

Adaptive indexes are new in Couchbase 5.0 and quite good for this exact use-case. The primary index gave us the opportunity to run ad-hoc queries, but as primary index keeps every type of document in the index it is going to be quite costly and slow. With an adaptive index, we can instead choose to index N number of fields in a document so that we can query a lot of fields in it.

Let’s say that we want to be able to make it possible to query all fields in our Person documents. A person has a name, an age, and a favourite food. An adaptive index for it could look like the following.


CREATE INDEX `person_adaptive` 
ON `thecuriousdev-demo`(DISTINCT PAIRS(self)) 
WHERE type = "tcd:person"

This will cause all documents of type tcd:person to have all their fields in the index. Which means we can run any ad-hoc query on the document, a couple of examples follows.


// List persons above 18 years old
SELECT * 
FROM `thecuriousdev-demo`
WHERE age > 18
AND type = "tcd:person"

// Check how many people that like tacos in each company
SELECT company, count(*)
FROM `thecuriousdev-demo`
WHERE favouriteFood = "tacos"
AND type = "tcd:person"
GROUP BY company

// Check how many people that are under 18 yeards old with the name Viktor
SELECT name, count(*)
FROM `thecuriousdev-demo`
WHERE age

All of these queries are using the same adaptive index. Please do keep in mind though, that adaptive can consume a lot of memory since all fields are being indexed in the documents in the index, but they are definitely cheaper and faster than having a primary index. So if you are looking to run ad-hoc queries, they should be your go-to index type.

However, even though they seem useful, for most use-cases I would not recommend using them. At least not if you are going to have more than a couple million documents in these indexes. Striving for covering index is going to be a lot quicker and save yourself a significant amount of memory. If you would like to support these ad-hoc queries anyway even after my warning. I highly recommend that you create the index with proper WHERE clauses that will filter out as many documents as possible.

Final words

We have gone through how to create good indexes that will make it possible for your application to scale. Queries on these indexes are actually impressively fast on the latest Couchbase 5.0 version. If you tried a previous Couchbase version and thought it was quick already then, you will probably be shocked how quick it is now. And this is one of the main reasons why I am such a big fan of Couchbase and document-based databases. You get the incredible speed of key-value fetches, plus, Couchbase lets you run the same powerful queries as you are used to in the SQL world.

snieking

Next Code Style and How to Enforce it »

Previous « NoSQL with Couchbase: Querying with N1QL

Published by

snieking

Tags: couchbasedocumentindexesn1qlnosqlscale

3 years ago

Couchbase Ansible Role
Deploying Couchbase has never been easier than before by using Ansible. I've written an Ansible…
NoSQL with Couchbase: Querying with N1QL
Even though the main purpose of document-based databases is key-value operations, Couchbase still supports more…
NoSQL with Couchbase: Working with JSON
Welcome to the third episode of the Couchbase series. In Couchbase everything (except for counters)…

Handle Stream Exceptions with an Attempt

Streams has become a very popular way to process a collection of elements. But a…

2 years ago

deployment
development

Deploying Spring Boot in Pivotal Cloud Foundry

A lot of focus on my previous blogs has been on how to build micro…

2 years ago

python

Working with High-Quality Reference Genomes

Learn how to work with high-quality reference genomes in this article by Tiago Antao, a…

2 years ago

java

Garbage Collection in JDK 12 and onward

Garbage collection is one of the key concepts of Java programming and up to now…

2 years ago

python

Understanding Convolution

Learn about convolution in this article by Sandipan Dey, a data scientist with a wide…

2 years ago

java

Lombok Builder with Jackson

Lombok comes with a very convenient way of creating immutable objects with the builder pattern.…

2 years ago

NoSQL with Couchbase: Creating Indexes that Scale

Covering Indexes

Adaptive Indexes

Final words

Related Post

Recent Posts

Handle Stream Exceptions with an Attempt

Deploying Spring Boot in Pivotal Cloud Foundry

Working with High-Quality Reference Genomes

Garbage Collection in JDK 12 and onward

Understanding Convolution

Lombok Builder with Jackson