0% found this document useful (0 votes)
5 views

sentiment analysis using LSTM (1)

Uploaded by

mrprs17122002
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

sentiment analysis using LSTM (1)

Uploaded by

mrprs17122002
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 5

Ex.

No 8: IMPLEMENT SENTIMENT ANALYSIS USING LSTM

AIM:
To write a python program to implement sentiment analysis using LSTM.

ALGORITHM:

1. Importing required libraries.


2. Loading the dataset and creating a new column ‘sentiment’ based on ‘rating’.
3. Checking for null values in the dataset.
4. Cleaning the data. It includes removing the special characters, digits, unnecessary
symbols, and stop words. Also, it is required to convert the words to their root form for
easy interpretation.
5. Visualizing the common words in the reviews. The size of each word represents its
frequency of occurrence in the data.
6. Encoding the target variable using ‘Label Encoder’ from the ‘sklearn’ library.
7. Tokenizing and converting the reviews into numerical vectors.
8. Building the LSTM model using the ‘Keras’ library. This step involves model
initialization, adding required LSTM layers, and model compilation
9. Splitting the data into training and testing data.
10. Training the model using training data.
11. Evaluating the model

PROGRAM

import numpy as np # linear algebra


import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from sklearn.feature_extraction.text import CountVectorizer


from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from sklearn.model_selection import train_test_split
#from keras.utils.np_utils import to_categorical
from tensorflow.python.keras.utils.np_utils import to_categorical
import re

# Input data files are available in the "../input/" directory.


# For example, running this (by clicking run or pressing Shift+Enter)
will list the files in the input directory

data = pd.read_csv('C:\\Users\\Sentiment.csv')
# Keeping only the neccessary columns
data = data[['text','sentiment']]

data = data[data.sentiment != "Neutral"]


data['text'] = data['text'].apply(lambda x: x.lower())
data['text'] = data['text'].apply((lambda x: re.sub('[^a-zA-z0-9\
s]','',x)))

print(data[ data['sentiment'] == 'Positive'].size)


print(data[ data['sentiment'] == 'Negative'].size)

for idx,row in data.iterrows():


row[0] = row[0].replace('rt',' ')

max_fatures = 2000
tokenizer = Tokenizer(num_words=max_fatures, split=' ')
tokenizer.fit_on_texts(data['text'].values)
X = tokenizer.texts_to_sequences(data['text'].values)
X = pad_sequences(X)

4472
16986

embed_dim = 128
lstm_out = 196

model = Sequential()
model.add(Embedding(max_fatures, embed_dim,input_length = X.shape[1]))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2,activation='softmax'))
model.compile(loss = 'categorical_crossentropy',
optimizer='adam',metrics = ['accuracy'])
print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 28, 128) 256000

spatial_dropout1d (Spatial (None, 28, 128) 0


Dropout1D)

lstm (LSTM) (None, 196) 254800

dense (Dense) (None, 2) 394

=================================================================
Total params: 511194 (1.95 MB)
Trainable params: 511194 (1.95 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None

Y = pd.get_dummies(data['sentiment']).values
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size =
0.33, random_state = 42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)

(7188, 28) (7188, 2)


(3541, 28) (3541, 2)

batch_size = 32
model.fit(X_train, Y_train, epochs = 7, batch_size=batch_size, verbose
= 2)

Epoch 1/7
225/225 - 49s - loss: 0.4306 - accuracy: 0.8172 - 49s/epoch -
217ms/step
Epoch 2/7
225/225 - 47s - loss: 0.3136 - accuracy: 0.8688 - 47s/epoch -
210ms/step
Epoch 3/7
225/225 - 51s - loss: 0.2783 - accuracy: 0.8854 - 51s/epoch -
226ms/step
Epoch 4/7
225/225 - 56s - loss: 0.2525 - accuracy: 0.8961 - 56s/epoch -
251ms/step
Epoch 5/7
225/225 - 54s - loss: 0.2262 - accuracy: 0.9065 - 54s/epoch -
241ms/step
Epoch 6/7
225/225 - 54s - loss: 0.2035 - accuracy: 0.9204 - 54s/epoch -
240ms/step
Epoch 7/7
225/225 - 55s - loss: 0.1842 - accuracy: 0.9265 - 55s/epoch -
242ms/step

<keras.src.callbacks.History at 0x159323d58d0>
validation_size = 1500

X_validate = X_test[-validation_size:]
Y_validate = Y_test[-validation_size:]
X_test = X_test[:-validation_size]
Y_test = Y_test[:-validation_size]
score,acc = model.evaluate(X_test, Y_test, verbose = 2, batch_size =
batch_size)
print("score: %.2f" % (score))
print("acc: %.2f" % (acc))

17/17 - 3s - loss: 0.4599 - accuracy: 0.8318 - 3s/epoch - 170ms/step


score: 0.46
acc: 0.83

pos_cnt, neg_cnt, pos_correct, neg_correct = 0, 0, 0, 0


for x in range(len(X_validate)):

result =
model.predict(X_validate[x].reshape(1,X_test.shape[1]),batch_size=1,ve
rbose = 2)[0]

if np.argmax(result) == np.argmax(Y_validate[x]):
if np.argmax(Y_validate[x]) == 0:
neg_correct += 1
else:
pos_correct += 1

if np.argmax(Y_validate[x]) == 0:
neg_cnt += 1
else:
pos_cnt += 1

print("pos_acc", pos_correct/pos_cnt*100, "%")


print("neg_acc", neg_correct/neg_cnt*100, "%")

1/1 - 0s - 84ms/epoch - 84ms/step


1/1 - 0s - 62ms/epoch - 62ms/step
1/1 - 0s - 62ms/epoch - 62ms/step
1/1 - 0s - 47ms/epoch - 47ms/step
1/1 - 0s - 47ms/epoch - 47ms/step
1/1 - 0s - 47ms/epoch - 47ms/step

1/1 - 0s - 78ms/epoch - 78ms/step


1/1 - 0s - 78ms/epoch - 78ms/step
1/1 - 0s - 94ms/epoch - 94ms/step
1/1 - 0s - 62ms/epoch - 62ms/step
1/1 - 0s - 78ms/epoch - 78ms/step
pos_acc 53.082191780821915 %
neg_acc 92.63245033112582 %

twt = ['Meetings: Because none of us is as dumb as all of us.']


#vectorizing the tweet by the pre-fitted tokenizer instance
twt = tokenizer.texts_to_sequences(twt)
#padding the tweet to have exactly the same shape as `embedding_2`
input
twt = pad_sequences(twt, maxlen=28, dtype='int32', value=0)
print(twt)
sentiment = model.predict(twt,batch_size=1,verbose = 2)[0]
if(np.argmax(sentiment) == 0):
print("negative")
elif (np.argmax(sentiment) == 1):
print("positive")

[[ 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0 0 0 206 633 6 150 5 55 1055 55 46 6
150]]
1/1 - 0s - 289ms/epoch - 289ms/step
negative

You might also like