Pipe Search Versus Where – Splunk Commands Difference Solved

What are the differences between pipe Search versus Where commands

There is plenty of information available online and Splunk documentation about pipe search versus where Splunk commands:

| where
| search

Decided to concentrate it in one article, including performance test.

Both commands behavior is similar – filter the results of the main search. We will explain both and see what is the difference.

The bottom line: “search” command is much faster on execution and less server “resource hungry” than “where”. With both commands you can do the same, but only “where” command supports field to field comparison. Meaning, for example, you can filter the results where “fieldA” values are bigger than “fieldB” values:

| where fieldA > fieldB

In addition, did a performance comparison of using only the main search command against using main search with pipe search command. Which resulted in no performance difference. Example:

index=main
Account="service-*"

against

index=main
| search Account="service-*"

If you are interested in more Splunk Enterprise search performance optimization, check the Splunk Documentation:
* Splunk Documentation – Quick Optimization Tips
* Splunk Documentation – Write Better Search Queries

Splunk Index Example

Here is an example of an index (that we are going to query) with user login events, which has several groups of users. Two main groups are:

user-*
service-*

The first group is for human user accounts and the second is for service accounts. Each one of them has several sub-groups:

user-sql-*
user-system-*
user-citrix-*

service-sql-*
service-system-*
service-citrix-*

An example of a full user group of ten users:

user-sql-lucyj
user-sql-annk
user-sql-guyb
user-sql-alexp
user-sql-jakeq
user-sql-mikec
user-sql-cerriep
user-sql-nickm
user-sql-katez

Each group can follow this example. The above just showing how the index is structured to understand the string manipulation in the following queries. The actual test index is much more complex.

Let us see which command works faster and has less load on our Splunk test server.

You can check the response of your queries in Splunk “Job Manager”:

Top panel [Activity] => [Jobs]


Pipe Search Versus Where – The Test

The queries

Executed a query that search for all the service accounts (which contain “sql” in them) in the main index for certain 2 hours with Logon Type 3 events. First time I piped to “search” command and the second time to “where” command.

1. Pipe “search” (| search) query:

index=main earliest=01/26/2021:01:00:00 latest=01/26/2021:03:00:00
logtype=3
| eval AccountLow=lower(Account)
| search Account="service*" Account="*sql*"
| fields domain LowAccount Account logtype
| stats values count(logtype) by AccountLow

2. Pipe “where” (| where) query:

index=main earliest=01/26/2021:01:00:00 latest=01/26/2021:03:00:00
logtype=3
| eval AccountLow=lower(Account)
| where (like(AccountLow,"service%")) AND (like(AccountLow,"%sql%"))
| fields domain LowAccount Account logtype
| stats values count(logtype) by AccountLow

The Result Example

Example result of both queries:

AccountLow            | values(Account)       | values(domain) | values(logtype) | count(logtype)
--------------------------------------------------------------------------------------------
service-sql-append    | service-sql-append    | test_domain    | 3               | 512
                      | SERVICE-SQL-APPEND    |                |                 |
--------------------------------------------------------------------------------------------
service-sql-provision | service-sql-provision | test_domain    | 3               | 438

The actual test index was much more complex, this is just an example.

Pipe Search Versus Where – Difference Points

Query Execution Speed and Server Load

The most important differences are server load and command execution speed in the Job Manager.

“Search” command statistics:
Size: 35 MB
Runtime: 11 seconds
Events: 20425

“Where” command statistics:
Size: 191 MB
Runtime: 45 seconds
Events: 20425

We can see that “where” command was much excessive on the server resources: 191 MB (where) against 35 MB (search). In addition, “where” command took much more time to execute: 45 seconds (where) against 11 seconds (search). After that I did several more executions of the same queries with similar results.
The queries above were with period of 2 hours. Conducted another search with the same queries, but within a day range.

“Search” command statistics:
Size: 258 MB
Runtime: 15 seconds
Events: 187596

“Where” command statistics:
Size: 182 MB
Runtime: 8:11 minutes
Events: 187596

Again “search” command was much faster, but took more space on the server.

Syntax – Quotes

The quotes in “search” command, like “service*” or “*sql*”, are not really needed, since I do not use any minor segmenters. I use quotes anyway for self-preference in more organized queries for any string objects that I use (does not matter if it is Splunk or other languages).

While passing string objects in “where” command you need to use the quotes, like in “service%” or “%sql%”. If you do not use the quotes this will be processed as the field name and not as a string value.
For example, if you want to filter results only where the value of “fieldA” equals to the value of “fieldB”:

| where fieldA=fieldB

This is the main processing difference between the commands. Since, you can do the same with both “search” and “where” commands, but only with “where” you can do field to field comparison.

Syntax – Letter Case Sensitivity

Since I output all the results to “stats values” command by “Account” field, there can be service account names with lowercase letters and uppercase letters, like “service-*” and “SERVICE-*”. This can result in entries of “service-*” accounts with count of 3 and another entry of “SERVICE-*” accounts with count of 5. But we do not actually care if the account name is upper or lowercase, we just want the total count of this particular account. Meaning, we require a total number of 8 count of the same service account name as one entry. So, for this issue I use pipe “eval” command to declare new variable of “AccountLow” with the values of the “Account” field with only lowercase names. When I sort my stats entries by lowercase service account names, I will get the statistics for the lower and uppercase names at the same entry.

So, basically, “search” does not require “lower” method for the command to work. “Search” will process lower and uppercase letter strings and you will get both lowercase and uppercase results.

“Where” command is case sensitive in my Splunk setup, so there is another meaning for the “lower” method on the “Account” field. While lowering all the characters in the “Account” field I can use “where” command with only lowercase search string without taking uppercase letters into consideration. Meaning, I do not need to use both lowercase “service-%” and uppercase “SERVICE-%”, I can use only the lowercase “service-%”.

A side note about the “fields” command. With “fields” command we pass only the fields that we want the “stats values” command to process. I include also the regular “Accounts” field with mixed case results in case I would want to see if there are any.

Syntax – Boolean Expression Operators

While using the “search” command you do not need to specify “AND” operator, so when we use a query:

Account="service*" Account="*sql*"

the results will include all the account names that begin with “service” and have “sql” in them, does not matter if it is in the middle or in the end.

When using “AND” operator with “where” command – you need to specify it in the query, unlike in “search” command.

Syntax – Wildcards

The wildcard character for “where” command is “%” (percent), unlike in “search” command, which is “*” (star).

Technical Specifications

Splunk Search Command Documentation. The official description (referencing only the “pipe search” and not the “main search”):

When the search command is used further down the pipeline, it is a distributable streaming command (a command that runs on the indexer and can be applied to subsets of index data in a parallel manner. A streaming command applies a transformation to each event returned by a search). The search terms that you can use depend on which fields are passed into the search command.
If the _raw field is passed into the search command, you can use the same types of search terms as you can when the search command is the first command in a search.

Splunk Where Command Documentation. The official description:

The where command uses eval-expressions to filter search results. These eval-expressions must be Boolean expressions, where the expression returns either true or false. The where command returns only the results for which the eval expression returns true. The syntax of the eval expression is checked before running the search, and an exception is thrown for an invalid expression.

Both commands technically work different. While “search” work with strings returning the filtered entries that match from the main search (referencing only the “pipe search” and not the “main search”) – “where” work with Boolean comparison and return entries that are true within the specified condition.

Checking Performance – Search with and without Second Search Pipe

The query that we used above:

index=main earliest=01/26/2021:01:00:00 latest=01/26/2021:03:00:00
logtype=3
| eval AccountLow=lower(Account)
| search Account="service*" Account="*sql*"
| fields domain LowAccount Account logtype
| stats values count(logtype) by AccountLow

Can also be used without the second “pipe search”. This was only to illustrate the “pipe search” process and differences against “pipe where” command. The best practice is to filter the data in the main search.
The query without the second search:

index=main earliest=01/26/2021:01:00:00 latest=01/26/2021:03:00:00
logtype=3 Account="service*" Account="*sql*"
| eval AccountLow=lower(Account)
| fields domain LowAccount Account logtype
| stats values count(logtype) by AccountLow

I executed each query 10 times and checked each execution performance in Job Manager. There were slight differences in performance statistics – one time the first query was faster and another time the second was faster. The Size was respectively as expected – faster executions took less space. Correlating the results of several search executions show that there is almost no difference.

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.