0% found this document useful (0 votes)
30 views

Methodology: Gathering Data Is The Most Important Step in Solving Any Supervised

Gathering a large amount of balanced training data from diverse examples is important for building an accurate text classifier. The data should then be processed using CNN which involves feature extraction in the convolution layer, data compression in the pooling layer, and vectorization and training in the fully connected layer to produce the final output. Filtering methods can be used to preprocess data by selecting statistically correlated features before machine learning algorithms are applied for classification.

Uploaded by

Akhila R
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Methodology: Gathering Data Is The Most Important Step in Solving Any Supervised

Gathering a large amount of balanced training data from diverse examples is important for building an accurate text classifier. The data should then be processed using CNN which involves feature extraction in the convolution layer, data compression in the pooling layer, and vectorization and training in the fully connected layer to produce the final output. Filtering methods can be used to preprocess data by selecting statistically correlated features before machine learning algorithms are applied for classification.

Uploaded by

Akhila R
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Methodology

Gathering data is the most important step in solving any supervised


machine learning problem. Your text classifier can only be as good as the
dataset it is built from.
Here are some important things to remember when collecting data:
• Understand the limitations of the API before using them. For example,
some APIs set a limit on the rate at which you can make queries.
• The more training examples (referred to as samples in the rest of this
guide) you have, the better. This will help your model generalize better.
• Make sure the number of samples for every class or topic is not
overly imbalanced. That is, you should have comparable number of
samples in each class.
• Make sure that your samples adequately cover the space of possible
inputs, not only the common cases.
Methodology
CNN training:
The gathered data should be processed and trained using CNN algorithm in order to finish
the training process soon. This involves 3 layers:
• Convolution layer: Here, the feature extraction will take place where only the useful
features which are needed to the machine will be collected and unwanted features will be
removed so that training period will be finished soon.
• Pooling layer: In this, the size of the data or image will be reduced and give us a
compressed document with important features which is needed for the machine.
• Fully connected layer: Here, the above data which we get from the previous layer will be
fed to fully connected layer in a vector form. Then these compressed features will be split
and get trained using CNN and will produce us the final output.
Methodology
• Filter methods are generally used as a preprocessing
step. The selection of features is independent of any
machine learning algorithms. Instead, features are
selected on the basis of their scores in various statistical
tests for their correlation with the outcome variable.
• Data filtering is the process of choosing a smaller part of
your data set and using that subset for viewing or
analysis. Filtering is generally (but not always) temporary
– the complete data set is kept, but only part of it is used
for the calculation.
Methodology
Classification is a supervised learning problem:
define a set of target classes and train a model
to recognize. Based on the trained data, we can
classify the results.

You might also like