Forwarder:: Upload Input
Forwarder:: Upload Input
Before we can start searching our machine data, we first need to get data into
our index.
Adding data is done by the admin user for the deployment.
While you might not be the administrator for your environment, it’s a good idea
to understand how data is ingested.
There are many ways to get data into Splunk Enterprise.
So many, in fact, that it can seem a little daunting.
But understanding what your option are will go a long way in making you feel
comfortable with the process.
Since we are logged in as an admin user, we see a large “Add Data” icon in the
home app.
Clicking on it will take us to the Add Data menu where we are given three
options for getting data into Splunk Enterise.
The upload option allows us to upload local files to a Splunk Enterprise instance
that only get indexed once.
This is good for data that is created once and never gets updated.
The monitor option allows us to monitor files, directories, http events, network
ports, or data gathering scripts located on Splunk Enterprise instances.
If we were in a Windows Environment we would also see options to monitor
Windows specific data.
This includes event logs, file system changes, performance metrics and network
information on both local and remote machines.
With the Forward Option we can receive data from external forwarders.
As we talked about earlier, they are installed on remote machines to gather data
and forward it to an indexer over a receiving port.
In most production environments, forwarders will be used as your main source
of data input.
Forwarder:
Upload Input:
While the upload data input option might not be very useful in production, it
comes in handy for testing or when you need to search a small dataset that never
gets updated.
Clicking on the upload button from the add data page we are given the option to
select a file from our local file system or to drag and drop the file we want to
index.
We have some customer survey data from a focus group.
We know the data will not be updated so the upload option will work well for it.
We upload the .csv file containing the data from our file system and click Next.
We are taken to a page where a sourcetype can be selected for the data.
Splunk uses sourcetypes to categorize the type of data being indexed.
The indexing processes frequently references the souretype and it is used in
many search management function.
If Splunk recognizes the data, it will assign it a pre-trained source type.
In this case it labeled our data correctly as .csv file
If this was not correct we could select a different predefined source type using
the drop down menu, or create a new one.
We can make adjustments to how Splunk processes time stamps and event
breaks by using the corresponding drop down menus.
These will change depending on the sourcetype selected.
Since this sourcetype is predefined, Splunk knows where to break the event,
where the time stamp is located, and how to automatically create field value
pairs.
Let’s look at what happens when a predefined sourcetype is not used.
We select the default settings from the drop down.
As you can see, Splunk does not know how to break the events.
Let’s go back to using the CSV source type.
And check the events to make sure the data is being extracted correctly.
We can save our sourcetype if we made any changes or if we want to give it a
different name.
We have option to change name, add a description, select what category to store
it in the predefined menu, and which app context to save it to.
The app context setting is something to be aware of in Splunk.
The selection you make will tell Splunk which app to apply this sourcetype to .
You can select to use it system wide, or for a specific app.
We want the sourcetype available system wide.
So we leave system selected.
Our sample data looks good, so we click “Next.”
We then select a host name.
A host name should be the name of the machine from which these events
originate.
You can set the host name using a constant value, regular expression based on
the file path, or a segment of the file path.
We enter a constant value of our instance’s name.
We can now set an index to import the data into or create a new one.
Indexes are directories where the data will be stored.
Main Inex
When many users first start using Splunk, the tendency is to store all events in
the main index, allowing them to use one index to search all their data.
There are some reasons you should reconsider doing this.
First, having separate indexes can make your searches more efficient.
Being able to use an index as part of a search string limits the amount of data
Splunk needs to search, example (index=web_data_index fail*) and return only
the events from that index.
Multiple indexes also allow you to limit access by user role, letting and admin
user control who can see what data.
Also, in most deployments, there are times when you will want or need to retain
data for different time intervals.
Keeping data in separate indexes will allow you to set retention policies by
index.
We’re going to save this data to a new index called “SurveyData.”
Clicking “Review,” we are taken to a review page, where we can see the setting
for our input.
Clicking “Submit” indexes the data, and we can start searching it.
Monitor Input
When the data you want to index comes from files or port on an indexer use the
monitor option.
Using the Monitor Option is similar to the upload option with a few differences.
Clicking on the monitor button we are taken to page where we select the source
to monitor.
We are given options to monitor files, directories, http events, ports, or monitor
data sources with a custom script you write.
We are going to monitor an apache log file on this server, so we click “Files and
Directories.”
We use the “browse” button to locate the log file, and click “Select.”
We have the option to continuously monitor the file, or index once.
Since we want to see events as they happen on the Server, we choose to
continuously monitor.
If we were selecting a directory to monitor, we could choose to whitelist and
blacklist specified files in the directory.
We click “Next.”
Splunk has selected a predefined sourcetype for the data and the sample events
look good, so we can click next.
Like the upload option, we can define a host name.
And select which index to use for the data.
But we also can select which app context to use for the input.
Clicking “Review” will display the settings for the input.
And clicking “Submit” will start indexing the data.
Making it available to search.
Forwarder
Using the Universal Forwarders Option
Setting up forwarders is out of the scope of this course.
I have added a link to your notes about using forwarders to get data into your
index if you would like more information.
In the next lab we will be downloading some sample machine data and using
upload option to ingest it into your lab environment.
Remember that this data should only loaded into your lab environment.
Do not ingest it into your production environment, as it will count against your
license.