Machine Learning Data Preparation for Splunk – vibration sensor use case
Machine learning data preparation is an essential prerequisite to training the model. Let’s consider a scenario where we employ machine learning to scrutinize data derived from an engine’s external vibration and internal speed sensors. The goal here is to predict potential malfunctions in the engine. You can customize this guide to fit any parameters or use cases you encounter while preparing data for machine learning. However, we will be using the vibration sensor as an example.
Affiliate: Experience limitless no-code automation, streamline your workflows, and effortlessly transfer data between apps with Make.com.
Steps to take for Splunk machine learning data preparation
First, we’ll do Data Collection and Preparation to gather the necessary data. We will need data from the external vibration sensor (x, y, z axis data) and the internal engine speed sensor. You can collect this data using Splunk’s data ingestion capabilities.
There are cases that after collecting the data, you’ll need to preprocess it. This step might include cleaning the data and transforming it into a suitable format for machine learning. The field names that we will use:
external vibration sensor x-axis --> external_vibration_sensor_x external vibration sensor y-axis --> external_vibration_sensor_y external vibration sensor z-axis --> external_vibration_sensor_z internal engine speed sensor --> internal_engine_sensor_speed
Adding features
Each parameter you use to train the machine learning model is called a feature. So, now we have four features based on the fields we described earlier. You can create new features that can help improve the model. For example, you can easily calculate the average of the x, y, and z vibrations by using the eval command in SPL:
| eval avg_vibration=(external_vibration_sensor_x+external_vibration_sensor_y+external_vibration_sensor_z)/3
“eval” in Splunk SPL is used for creating a new field named “avg_vibration” in the dataset. The value of “avg_vibration” is determined by taking the average of three existing fields in the data – “external_vibration_sensor_x,” “external_vibration_sensor_y,” and “external_vibration_sensor_z.”
In simpler terms, the command just adds the values of the three vibration sensors (x, y, and z) and then divides the result by 3 to get the average. This average value, “avg_vibration,” represents the overall vibration sensed by the external sensors in three directions or dimensions.
It’s important to note that adding new features should be done judiciously, as not all created features may improve the model’s performance. Though the fewer features the model has on its training input, the better it will work to analyze your data. Your machine learning model should perform better if you use the average vibration data (one field) instead of three for the x, y, and z-axis fields.
Splunk Essentials for Predictive Maintenance add-on
If you want additional help, you can use Splunk Essentials for Predictive Maintenance add-on. This add-on has more detailed examples of data ingestion, gathering, and preprocessing. It can help you more in your particular use case. In addition, it has a sample jet engine dataset on which you can practice Splunk machine-learning skills.