Why You Rarely Need ElasticSearch When You Have PostgreSQL
ElasticSearch is a popular open-source search engine that provides fast and scalable full-text search capabilities. It is often used as a complement to relational databases, such as PostgreSQL, to enable complex queries and analytics on large and dynamic datasets. However, ElasticSearch also comes with some drawbacks and challenges, such as data duplication, synchronization issues, security risks, and operational overhead. In this post, I will argue that you rarely need ElasticSearch when you have PostgreSQL, as PostgreSQL already offers powerful and flexible full-text search features that can meet most of your search needs.
What is Full-Text Search?
Full-text search (FTS) is the process of finding relevant documents or records that match a given query based on the content of the documents or records. Unlike simple keyword search, FTS can handle natural language queries, synonyms, stemming, ranking, relevance scoring, and other advanced features that improve the quality and accuracy of the search results.
How Does PostgreSQL Support Full-Text Search?
PostgreSQL is a robust and versatile relational database management system (RDBMS) that supports many advanced features, including FTS. PostgreSQL has a built-in FTS module that allows you to create, query, and manage full-text indexes on any text-based columns in your tables. PostgreSQL’s FTS module supports the following features:
- Text normalization: PostgreSQL can convert text into a standard form that is suitable for FTS, such as lowercasing, removing punctuation, splitting words, and applying stemming and lemmatization algorithms.
- Text analysis: PostgreSQL can analyze text using predefined or custom dictionaries that define how to tokenize, classify, and transform words. For example, you can use dictionaries to remove stop words (such as “the” or “and”), map synonyms (such as “car” or “automobile”), or handle morphological variations (such as “run” or “running”).
- Text representation: PostgreSQL can represent text as a tsvector data type, which is a sorted list of distinct lexemes (normalized words) with optional positional information. A tsvector can be stored as a column in a table or generated dynamically from a text column using a to_tsvector function.
- Text search: PostgreSQL can perform FTS queries using a tsquery data type, which is a Boolean expression of lexemes that can include operators (such as & for AND, | for OR, ! for NOT) and modifiers (such as :* for prefix matching or <-> for phrase matching). A tsquery can be constructed from a text query using a to_tsquery function or a plainto_tsquery function (which adds implicit AND operators between words). PostgreSQL can compare a tsquery with a tsvector using an @@ operator, which returns true if the tsvector matches the tsquery.
- Text ranking: PostgreSQL can rank the results of FTS queries using various ranking functions, such as ts_rank or ts_rank_cd, which assign a score to each matching document based on the frequency and position of the query terms in the document. You can also customize the ranking functions by providing weighting parameters that reflect the importance of different document parts (such as title, abstract, or body).
How Does PostgreSQL Compare with ElasticSearch?
ElasticSearch is a distributed and schema-less search engine that is based on Apache Lucene, a Java library that provides FTS capabilities. ElasticSearch can index and search any kind of data, such as text, numbers, dates, geospatial data, structured data, or nested data. ElasticSearch supports the following features:
- Distributed architecture: ElasticSearch can scale horizontally by distributing data across multiple nodes in a cluster. Each node can store one or more shards (partitions) of an index, which can be replicated for fault tolerance and load balancing. ElasticSearch also provides mechanisms for cluster management, node discovery, shard allocation, and cluster health monitoring.
- Schema-less design: ElasticSearch does not require you to define a schema for your data before indexing it. Instead, it infers the data type and structure from the JSON documents that you send to it. You can also change the mapping (the internal representation of the data) of an index at any time without reindexing the data.
- Powerful query language: ElasticSearch supports a rich and expressive query language called Query DSL (Domain Specific Language), which allows you to construct complex queries using JSON syntax. You can use Query DSL to perform various types of queries, such as match queries (for full-text search), term queries (for exact value matching), range queries (for numeric or date ranges), geo queries (for geospatial data), nested queries (for nested data), aggregations (for analytics), and more.
- Flexible analysis: ElasticSearch allows you to customize how your data is analyzed before indexing and searching it. You can define analyzers (which consist of character filters, tokenizers, and token filters) that control how text is normalized, tokenized, and transformed. You can also use built-in or custom plugins that provide additional analysis features, such as language detection, synonym expansion, stemming, lemmatization, etc.
At first glance, ElasticSearch seems to offer more features and flexibility than PostgreSQL for FTS. However, PostgreSQL also has some advantages and benefits over ElasticSearch, such as:
- Data consistency: PostgreSQL guarantees ACID (Atomicity, Consistency, Isolation, Durability) properties for transactions, which means that you can perform multiple operations on your data as a single unit and ensure that your data is always in a valid and consistent state. ElasticSearch, on the other hand, does not support transactions and only provides eventual consistency, which means that your data may not be immediately visible or updated across all nodes in the cluster.
- Data integrity: PostgreSQL enforces data integrity constraints, such as primary keys, foreign keys, unique constraints, check constraints, etc., which prevent you from inserting or updating invalid or duplicate data. ElasticSearch does not support any data integrity constraints and relies on the application logic to validate and deduplicate the data.
- Data security: PostgreSQL provides various security features, such as authentication, authorization, encryption, auditing, etc., which allow you to control who can access and manipulate your data. ElasticSearch does not have any built-in security features and requires external plugins or services (such as X-Pack or Search Guard) to enable security functionalities.
- Data versatility: PostgreSQL supports many data types and functions that are not available in ElasticSearch, such as arrays, JSONB (binary JSON), XML, hstore (key-value pairs), geometric types, window functions, common table expressions, etc. These data types and functions can enhance the expressiveness and performance of your queries and applications.
- Operational simplicity: PostgreSQL is a mature and stable RDBMS that has been in development for over 30 years. It has a large and active community that provides documentation, support, and tools for PostgreSQL users and developers. PostgreSQL is also easy to install, configure, and maintain. ElasticSearch is a relatively new and complex system that requires more expertise and resources to operate. It has a steep learning curve and a fast release cycle that may introduce breaking changes or bugs.
When Do You Need ElasticSearch?
Given the advantages and benefits of PostgreSQL for FTS, you may wonder when do you actually need ElasticSearch? The answer is: it depends on your use case and requirements. There is no definitive rule or guideline that can tell you when to use one or the other. However, here are some general scenarios where ElasticSearch may be a better choice than PostgreSQL:
- You have very large and dynamic datasets: If you have to deal with terabytes or petabytes of data that are constantly changing or growing, you may benefit from the scalability and performance of ElasticSearch. ElasticSearch can handle high volumes of data ingestion and query throughput by distributing the load across multiple nodes in a cluster. PostgreSQL can also scale horizontally by using replication or sharding techniques, but it may require more effort and complexity to set up and manage.
- You have very complex and diverse queries: If you have to perform sophisticated queries that involve multiple types of data, such as text, numbers, dates, geospatial data, nested data, etc., you may benefit from the flexibility and power of ElasticSearch’s Query DSL. ElasticSearch can handle various kinds of queries with ease and efficiency by using its inverted index structure and various analysis features. PostgreSQL can also perform complex queries by using its FTS module and other data types and functions, but it may require more SQL skills and optimization techniques to achieve the same results.
- You have very specific analysis needs: If you have to customize how your data is analyzed before indexing and searching it according to your domain or language needs, you may benefit from the extensibility and configurability of ElasticSearch’s analysis framework. ElasticSearch allows you to define your own analyzers or use plugins that provide additional analysis features for your data. PostgreSQL also allows you to customize its FTS module by using dictionaries or extensions that provide different text analysis options for your data.
Here are some query examples for both PostgreSQL and ElasticSearch for FTS, using the same dataset of blog posts with title, body, and tags fields:
- To find blog posts that contain the words “PostgreSQL” and “ElasticSearch” in the title or body, you can use the following queries:
1
2
3
4
| -- PostgreSQL
SELECT title, body, tags FROM blog_posts
WHERE to_tsvector('english', title || ' ' || body)
@@ to_tsquery('english', 'PostgreSQL & ElasticSearch');
|
1
2
3
4
5
6
7
8
9
| // ElasticSearch
GET /blog_posts/_search
{
"query": {
"match": {
"_all": "PostgreSQL ElasticSearch"
}
}
}
|
- To find blog posts that have the tag “database” or “search engine”, you can use the following queries:
1
2
3
| -- PostgreSQL
SELECT title, body, tags FROM blog_posts
WHERE tags @> ARRAY['database'] OR tags @> ARRAY['search engine'];
|
1
2
3
4
5
6
7
8
9
| // ElasticSearch
GET /blog_posts/_search
{
"query": {
"terms": {
"tags": ["database", "search engine"]
}
}
}
|
- To find blog posts that have the phrase “full-text search” in the title or body, and rank them by relevance, you can use the following queries:
1
2
3
4
5
| -- PostgreSQL
SELECT title, body, tags, ts_rank(to_tsvector('english', title || ' ' || body),
to_tsquery('english', 'full-text <-> search')) AS rank FROM blog_posts
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'full-text <-> search')
ORDER BY rank DESC;
|
1
2
3
4
5
6
7
8
9
10
11
12
| // ElasticSearch
GET /blog_posts/_search
{
"query": {
"match_phrase": {
"_all": "full-text search"
}
},
"sort": [
"_score"
]
}
|
These are just some simple examples of how to use PostgreSQL and ElasticSearch for FTS. For more details and features, please refer to their official documentation .
Conclusion
- In this post, I have explained why you rarely need ElasticSearch when you have PostgreSQL for FTS. I have compared the features and capabilities of both systems for FTS and highlighted the advantages and benefits of PostgreSQL over ElasticSearch. I have also discussed some scenarios where ElasticSearch may be a better choice than PostgreSQL for FTS.
- I hope this post has helped you understand the trade-offs and considerations involved in choosing between PostgreSQL and ElasticSearch for FTS. Ultimately, the best solution depends on your specific use case and requirements. Therefore, I encourage you to experiment with both systems and see which one works better for you.
Full Text Search - PostgreSQL Documentation
- Elasticsearch: The Definitive Guide