When Input Event Count Exceeds Maximum Inputs
The “Input Event Count Exceeds Maximum Inputs” error message in Splunk related to using Splunk Machine Learning Toolkit (MLTK) addon. While using the “fit” SPL command to train your ML model, you might encounter this error message:
Affiliate: Experience limitless no-code automation, streamline your workflows, and effortlessly transfer data between apps with Make.com.
Input event count exceeds max_inputs for KMeans (100000), model will be fit on a sample of events. To configure limits, use mlspl.conf or the "Settings" tab in the app navigation bar.
Example of a “fit” command usage:
| fit KMeans field1 field2 field3 k=3 random_state=0 into my_kmeans_model
Note: The error example above contains the “KMeans” machine learning algorithm, but you can apply this guide to any MLTK algorithm on which you encounter this error.
The error message about “input event count exceeds maximum inputs” you’re encountering occurs when the number of input events (the data points you’re trying to fit into a KMeans model) exceeds the maximum limit defined in the Machine Learning Toolkit (MLTK) of Splunk.
The max_inputs parameter is a limit set to prevent excessive computational load. By default, Splunk’s MLTK restricts the number of inputs a specific algorithm can process simultaneously. This limit is 100,000 in your case, and the data you’re trying to process exceeds this number.
How to Solve the Input Event Count Exceeds Maximum Inputs Error
The easiest way to solve the “input event count exceeds maximum inputs” error, would be to alter your search query to have fewer data points. Or use the “sample” SPL command to sample your data before feeding it into the KMeans algorithm. This method means you’ll only take a subset of your data for the model to train on. The “sample” command is also part of Splunk MLTK.
index=my_index | sample ratio=0.1 seed=1 | fit KMeans field1 field2 field3 k=3 random_state=0 into my_kmeans_model
In the above command, ratio=0.1 will randomly select 10% of your events. seed=1 is just a random seed to ensure reproducibility. Adjust the ratio based on your needs and computational resources. Remember to test your solution thoroughly before moving to a production environment.
Also, note that even the sample command can help you solve the “input event count exceeds maximum inputs” error, it works best when the data points distribution in your index is even. If the data points distribution in your index is not even, the “sample” might create a skewed dataset that doesn’t represent the original data set well. If you believe this to be the case, you might want to consider a different approach to downsampling your data
Increase Algorithm’s Input Limit
Another approach to solve the “input event count exceeds maximum inputs” error, would be to increase the input limit configuration. However, be cautious with this approach as it may cause a high computational load and affect your Splunk instance’s performance. Make sure your system has the necessary resources to handle a larger dataset.
Increasing through Splunk Web Interface
Solving the “input event count exceeds maximum inputs” error through the Splunk web interface:
Click on the Splunk logo in the top left corner.
On the left menu, click on “Splunk Machine Learning Toolkit.”
A second menu will appear under the Splunk logo on the left; click “Settings.”
Find the “KMeans” algorithm in the list and click on it.
Find the “max_input” option and change the setting to have more inputs.
Execute your SPL query again.
Increase through mlspl.conf File
You can increase this limit in the mlspl.conf configuration file (typically found in the $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/local directory) to solve the “input event count exceeds maximum inputs” error.
[default] max_inputs = <your_new_limit>