0% found this document useful (0 votes)
53 views

Untitled28.ipynb - Colaboratory

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Untitled28.ipynb - Colaboratory

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import re
import string
import nltk
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.dummy import DummyClassifier
from sklearn.metrics import precision_score, recall_score, confusion_matrix
from sklearn.metrics import f1_score, roc_auc_score, roc_curve

print("Setup Complete")
!pip install keras
!pip install --upgrade tensorflow
!pip install --upgrade keras

#!pip install tensorflow


!pip install --upgrade tensorflow

# Load the data


import pandas as pd
data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/train_data.csv')
test = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/test_data.csv')
test_prediction = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/test_data_hidden

data.head()

name brand categories primaryCategories reviews

All-New
Fire HD 8
Electronics,iPad &
Tablet, 8"
0 Amazon Tablets,All Tablets,Fire Electronics 2016-12-26T00:00:00
HD
Ta...
Display,
Wi-Fi...

Amazon -
Echo Plus Amazon Echo,Smart
1 w/ Built- Amazon Home,Networking,Home Electronics,Hardware 2018-01-17T00:00:00
In Hub - & Tools...
Silver

Amazon
Echo

Positive = data[data['sentiment']== "Positive"].iloc[:,[5,6,7]]


Neutral = data[data['sentiment'] "Neutral"].iloc[:,[5 6 7]]

1 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

Neutral = data[data['sentiment']== "Neutral"].iloc[:,[5,6,7]]


Negative = data[data['sentiment']== "Negative"].iloc[:,[5,6,7]]

Positive['sentiment'].value_counts()
Positive 3749
Name: sentiment, dtype: int64

Neutral['sentiment'].value_counts()
Neutral 158
Name: sentiment, dtype: int64

Negative['sentiment'].value_counts()
Negative 93
Name: sentiment, dtype: int64

# Keeping only those Features that we need for further exploring.


data1 = data [["sentiment","reviews.text"]]

data1.head()

sentiment reviews.text

0 Positive Purchased on Black FridayPros - Great Price (e...

1 Positive I purchased two Amazon in Echo Plus and two do...

2 Neutral Just an average Alexa option. Does show a few ...

3 Positive very good product. Exactly what I wanted, and ...

4 Positive This is the 3rd one I've purchased. I've bough...

# Resetting the Index.


data1.index = pd.Series(list(range(data1.shape[0])))

print('Shape : ',data1.shape)
data1.head()
Shape : (4000, 2)
sentiment reviews.text

0 Positive Purchased on Black FridayPros - Great Price (e...

1 Positive I purchased two Amazon in Echo Plus and two do...

2 Neutral Just an average Alexa option. Does show a few ...

3 Positive very good product. Exactly what I wanted, and ...

4 Positive This is the 3rd one I've purchased. I've bough...

from nltk.tokenize import RegexpTokenizer


from nltk.corpus import stopwords

2 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

import nltk
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')
#Download Stopwords
nltk.download('stopwords')

wordnet_lemmatizer = WordNetLemmatizer()
tokenizer = RegexpTokenizer(r'[a-z]+')
stop_words = set(stopwords.words('english'))

def preprocess(document):
document = document.lower() # Convert to lowercase
words = tokenizer.tokenize(document) # Tokenize
words = [w for w in words if not w in stop_words] # Removing stopwords
# Lemmatizing
for pos in [wordnet.NOUN, wordnet.VERB, wordnet.ADJ, wordnet.ADV]:
words = [wordnet_lemmatizer.lemmatize(x, pos) for x in words]
return " ".join(words)

print("Setup Complete")
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
Setup Complete
[nltk_data] Unzipping corpora/stopwords.zip.

data1['Processed_Review'] = data1['reviews.text'].apply(preprocess)

data1.head()
<ipython-input-28-7a0be1bed3d0>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stabl


data1['Processed_Review'] = data1['reviews.text'].apply(preprocess)
sentiment reviews.text Processed_Review

Purchased on Black purchase black fridaypros great


0 Positive
FridayPros - Great Price (e... price even sal...

I purchased two Amazon in purchase two amazon echo plus


1 Positive
Echo Plus and two do... two dot plus fou...

Just an average Alexa option. average alexa option show thing


2 Neutral
Does show a few ... screen still l...

very good product. Exactly good product exactly want good


3 Positive
data2 = data1 [["sentiment","Processed_Review"]]
data2.head()

sentiment Processed_Review

0 Positive purchase black fridaypros great price even sal...

1 Positive purchase two amazon echo plus two dot plus fou...

3 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

purchase two amazon echo plus two dot plus fou...

2 Neutral average alexa option show thing screen still l...

3 Positive good product exactly want good price

4 Positive rd one purchase buy one niece case compare one...

# for tf-idf
def textPreprocessing(data2):
#Remove Punctuation Logic
import string
removePunctuation = [char for char in data2 if char not in string.punctuation
#Join Chars to form sentences
sentenceWithoutPunctuations = ''.join(removePunctuation)
words = sentenceWithoutPunctuations.split()
#StopwordRemoval
from nltk.corpus import stopwords
removeStopwords = [word for word in words if word.lower() not in stopwords.words

return removeStopwords

data2.groupby('sentiment').describe()

Processed_Review

count unique top freq

sentiment

last model kindle hdx terrible purchase


Negative 93 78 3
model ...

average alexa option show thing screen still


Neutral 158 145 2
l...

#Text preprocessing
data2['Processed_Review'].head(2).apply(textPreprocessing)
0 [purchase, black, fridaypros, great, price, ev...
1 [purchase, two, amazon, echo, plus, two, dot, ...
Name: Processed_Review, dtype: object

from sklearn.feature_extraction.text import CountVectorizer


bow = CountVectorizer(analyzer=textPreprocessing).fit(data2['Processed_Review'

len(bow.vocabulary_)
3407

reviews_bow = bow.transform(data2['Processed_Review'])

from sklearn.feature_extraction.text import TfidfTransformer


tfidfData = TfidfTransformer().fit(reviews_bow)
tfidfDataFinal = tfidfData.transform(reviews_bow)

4 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

tfidfDataFinal.shape
(4000, 3407)

# using models to learn from data


from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB().fit(tfidfDataFinal,data2['sentiment'])

model

▾ MultinomialNB
MultinomialNB()

inputData = "very bad dont like it at all it sucks"


l1 = textPreprocessing(inputData)
l2 = bow.transform(l1)
l3 = tfidfData.transform(l2)
prediction = model.predict(l3[0])
prediction
array(['Positive'], dtype='<U8')

#Creating independent and Dependent Features


columns = data2.columns.tolist()
# Filtering the columns to remove data we do not want
columns = [c for c in columns if c not in ["sentiment"]]
# Store the variable we are predicting
target = "sentiment"
# Defining a random state
state = np.random.RandomState(42)
X = data2[columns]
Y = data2[target]
# Printing the shapes of X & Y
print(X.shape)
print(Y.shape)
(4000, 1)
(4000,)

columns
['Processed_Review']

print(data2.sentiment.value_counts())
Positive 3749
Neutral 158
Negative 93
Name: sentiment, dtype: int64

# Using Matplotlib to show distribution of reviews sentiment in the dataset

5 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

# Using Matplotlib to show distribution of reviews


print(data1.sentiment.value_counts())
data1['sentiment'].value_counts().plot(kind='bar')
plt.title("Distribution of Reviews Sentiment", size=18)
Positive 3749
Neutral 158
Negative 93
Name: sentiment, dtype: int64
Text(0.5, 1.0, 'Distribution of Reviews Sentiment')

# RandomOverSampler to handle imbalanced data


from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=0)
X_res,Y_res=ros.fit_resample(X,Y)

from collections import Counter


print(sorted(Counter(Y_res).items()))
[('Negative', 3749), ('Neutral', 3749), ('Positive', 3749)]

X_res.shape,Y_res.shape
((11247, 1), (11247,))

#Creating X output to dataframe


X1=pd.DataFrame(X_res,columns=['Processed_Review'])

#Creating Y output to dataframe for merging

6 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

Y1=pd.DataFrame(Y_res,columns=['sentiment'])

#Merging the X & Y output to Final data


Final_data=pd.concat([X1,Y1],axis=1)
Final_data.head()

Processed_Review sentiment

0 purchase black fridaypros great price even sal... Positive

1 purchase two amazon echo plus two dot plus fou... Positive

2 average alexa option show thing screen still l... Neutral

3 good product exactly want good price Positive

4 rd one purchase buy one niece case compare one... Positive

Final_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11247 entries, 0 to 11246
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Processed_Review 11247 non-null object
1 sentiment 11247 non-null object
dtypes: object(2)
memory usage: 175.9+ KB

# Using Matplotlib to plot the final data & show distribution of reviews sentiment in
print(Final_data.sentiment.value_counts())
Final_data['sentiment'].value_counts().plot(kind='bar')
plt.title("Distribution of Reviews Sentiment", size=18)
Positive 3749
Neutral 3749
Negative 3749
Name: sentiment, dtype: int64
Text(0.5, 1.0, 'Distribution of Reviews Sentiment')

7 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

df = Final_data.sample(frac=0.1, random_state=0)

# Dropping missing values


df.dropna(inplace=True)

df.head()

Processed_Review sentiment

8805 buy think would great read book play game howe... Neutral

9736 good tablet kid lot appts download game Neutral

125 item work expect great product Positive

10143 great beginner like child limit use many apps ... Neutral

10937 buy kindle past time one come defective port b... Neutral

# Splitting data into training set and validation


X_train, X_test, y_train, y_test = train_test_split(df['Processed_Review'], df
test_size=0.1, random_state=

print('Load %d training examples and %d validation examples. \n' %(X_train.shape


print('Show a review in the training set : \n', X_train.iloc[10])
Load 1012 training examples and 113 validation examples.

Show a review in the training set :


daughter love easy navigate hard break

def cleanText(raw_text, remove_stopwords=False, stemming=False, split_text=False


):
'''
Convert a raw review to a cleaned review
'''
text = BeautifulSoup(raw_text, 'lxml').get_text() #remove html
letters_only = re.sub("[^a-zA-Z]", " ", text) # remove non-character
words = letters_only.lower().split() # convert to lower case

if remove_stopwords: # remove stopword


stops = set(stopwords.words("english"))
words = [w for w in words if not w in stops]

if stemming==True # stemming

8 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

if stemming==True: # stemming
# stemmer = PorterStemmer()
stemmer = SnowballStemmer('english')
words = [stemmer.stem(w) for w in words]

if split_text==True: # split text


return (words)

return( " ".join(words))

# Preprocess text data in training set and validation set


X_train_cleaned = []
X_test_cleaned = []

for d in X_train:
X_train_cleaned.append(cleanText(d))
print('Show a cleaned review in the training set : \n', X_train_cleaned[10])

for d in X_test:
X_test_cleaned.append(cleanText(d))
Show a cleaned review in the training set :
daughter love easy navigate hard break

# Fit and transform the training data to a document-term matrix using CountVectorizer
countVect = CountVectorizer()
X_train_countVect = countVect.fit_transform(X_train_cleaned)

# Train MultinomialNB classifier


mnb = MultinomialNB()
mnb.fit(X_train_countVect, y_train)

▾ MultinomialNB
MultinomialNB()

def modelEvaluation(predictions):
'''
Print model evaluation to predicted result
'''
print ("\nAccuracy on validation set: {:.4f}".format(accuracy_score(y_test
#print("\nAUC score : {:.4f}".format(roc_auc_score(y_test, predictions)))
print("\nClassification report : \n", metrics.classification_report(y_test
print("\nConfusion Matrix : \n", metrics.confusion_matrix(y_test, predictions

# Evaluate the model on validaton set


predictions = mnb.predict(countVect.transform(X_test_cleaned))
modelEvaluation(predictions)

Accuracy on validation set: 0.8938

Classification report :
precision recall f1-score support

9 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

Negative 0.93 0.95 0.94 39


Neutral 0.85 0.90 0.88 39
Positive 0.91 0.83 0.87 35

accuracy 0.89 113


macro avg 0.89 0.89 0.89 113
weighted avg 0.89 0.89 0.89 113

Confusion Matrix :
[[37 0 2]
[ 3 35 1]
[ 0 6 29]]

# Fitting and transforming the training data to a document-term matrix using TfidfVect
tfidf = TfidfVectorizer(min_df=5) #minimum document frequency of 5
X_train_tfidf = tfidf.fit_transform(X_train)

# Logistic Regression
lr = LogisticRegression()
lr.fit(X_train_tfidf, y_train)

▾ LogisticRegression
LogisticRegression()

# Evaluating on the validaton set


predictions = lr.predict(tfidf.transform(X_test_cleaned))
modelEvaluation(predictions)

Accuracy on validation set: 0.9292

Classification report :
precision recall f1-score support

Negative 0.93 1.00 0.96 39


Neutral 0.88 0.92 0.90 39
Positive 1.00 0.86 0.92 35

accuracy 0.93 113


macro avg 0.94 0.93 0.93 113
weighted avg 0.93 0.93 0.93 113

Confusion Matrix :
[[39 0 0]
[ 3 36 0]
[ 0 5 30]]

# Evaluating on the validaton set


predictions = clf.predict(tfidf.transform(X_test_cleaned))
modelEvaluation(predictions)

Accuracy on validation set: 0.9115

10 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

Accuracy on validation set: 0.9115

Classification report :
precision recall f1-score support

Negative 0.93 1.00 0.96 39


Neutral 0.84 0.95 0.89 39
Positive 1.00 0.77 0.87 35

accuracy 0.91 113


macro avg 0.92 0.91 0.91 113
weighted avg 0.92 0.91 0.91 113

Confusion Matrix :
[[39 0 0]
[ 2 37 0]
[ 1 7 27]]

# Building a pipeline
estimators = [("tfidf", TfidfVectorizer()), ("lr", LogisticRegression())]
model = Pipeline(estimators)

# Grid search
params = {"lr__C":[0.1, 1, 10], #regularization param of logistic regression
"tfidf__min_df": [1, 3], #min count of words
"tfidf__max_features": [1000, None], #max features
"tfidf__ngram_range": [(1,1), (1,2)], #1-grams or 2-grams
"tfidf__stop_words": [None, "english"]} #use stopwords or don't

grid = GridSearchCV(estimator=model, param_grid=params, scoring="accuracy", n_jobs=


grid.fit(X_train_cleaned, y_train)
print("The best paramenter set is : \n", grid.best_params_)

# Evaluate on the validaton set


predictions = grid.predict(X_test_cleaned)
modelEvaluation(predictions)
The best paramenter set is :
{'lr__C': 10, 'tfidf__max_features': None, 'tfidf__min_df': 1, 'tfidf__ngram_ran

Accuracy on validation set: 0.9381

Classification report :
precision recall f1-score support

Negative 0.97 0.97 0.97 39


Neutral 0.90 0.95 0.92 39
Positive 0.94 0.89 0.91 35

accuracy 0.94 113


macro avg 0.94 0.94 0.94 113
weighted avg 0.94 0.94 0.94 113

Confusion Matrix :

11 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

Confusion Matrix :
[[38 0 1]
[ 1 37 1]
[ 0 4 31]]

import nltk
nltk.download('punkt')
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
True

# Splitting review text into parsed sentences using NLTK's punkt tokenizer

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

def parseSent(review, tokenizer, remove_stopwords=False):


'''
Parse text into sentences
'''
raw_sentences = tokenizer.tokenize(review.strip())
sentences = []
for raw_sentence in raw_sentences:
if len(raw_sentence) > 0:
sentences.append(cleanText(raw_sentence, remove_stopwords, split_text=
return sentences

# Parsing each review in the training set into sentences


sentences = []
for review in X_train_cleaned:
sentences += parseSent(review, tokenizer)

print('%d parsed sentence in the training set\n' %len(sentences))


print('Show a parsed sentence in the training set : \n', sentences[10])
1012 parsed sentence in the training set

Show a parsed sentence in the training set :


['daughter', 'love', 'easy', 'navigate', 'hard', 'break']

from gensim.models import Word2Vec


w2v = Word2Vec()

# Fitting parsed sentences to Word2Vec model

num_features = 300 #embedding dimension


min_word_count = 10
num_workers = 4
context = 10
downsampling = 1e-3

print("Training Word2Vec model ...\n")


w2v = Word2Vec(sentences, workers=num_workers,vector_size=num_features ,min_count = mi
window = context sample = downsampling)

12 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

window = context, sample = downsampling)


w2v.init_sims(replace=True)
w2v.save("w2v_300features_10minwordcounts_10context") #save trained word2vec model

print("Number of words in the vocabulary list : %d \n" %len(w2v.wv.index_to_key


print("Show first 10 words in the vocabulary list vocabulary list: \n", w2v.wv.index_
Training Word2Vec model ...

<ipython-input-99-d84c32a0ada7>:12: DeprecationWarning: Call to deprecated `init_


w2v.init_sims(replace=True)
WARNING:gensim.models.keyedvectors:destructive init_sims(replace=True) deprecated
Number of words in the vocabulary list : 416

Show first 10 words in the vocabulary list vocabulary list:


['buy', 'tablet', 'use', 'good', 'great', 'work', 'get', 'one', 'amazon', 'kindl

# Transfroming the training data into feature vectors

def makeFeatureVec(review, model, num_features):


'''
Transform a review to a feature vector by averaging feature vectors of words
appeared in that review and in the vocabulary list created
'''
featureVec = np.zeros((num_features,),dtype="float32")
nwords = 0.
index2word_set = set(model.wv.index2word) #index2word is the vocabulary list of th
isZeroVec = True
for word in review:
if word in index2word_set:
nwords = nwords + 1.
featureVec = np.add(featureVec, model[word])
isZeroVec = False
if isZeroVec == False:
featureVec = np.divide(featureVec, nwords)
return featureVec

def getAvgFeatureVecs(reviews, model, num_features):


'''
Transform all reviews to feature vectors using makeFeatureVec()
'''
counter = 0
reviewFeatureVecs = np.zeros((len(reviews),num_features),dtype="float32")
for review in reviews:
reviewFeatureVecs[counter] = makeFeatureVec(review, model,num_features
counter = counter + 1
return reviewFeatureVecs

# Getting feature vectors for training set


X_train_cleaned = []
for review in X_train:
X_train_cleaned.append(cleanText(review, remove_stopwords=True, split_text=

13 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

# Getting feature vectors for validation set


X_test_cleaned = []
for review in X_test:
X_test_cleaned.append(cleanText(review, remove_stopwords=True, split_text=True

#Applying lstm
df = Final_data.sample(frac=0.1, random_state=0)

# Drop missing values


df.dropna(inplace=True)

# Convert the sentiments


df.sentiment.replace(('Positive','Negative','Neutral'),(1,0,2),inplace=True)

df.head()

Processed_Review sentiment

8805 buy think would great read book play game howe... 2

9736 good tablet kid lot appts download game 2

125 item work expect great product 1

10143 great beginner like child limit use many apps ... 2

10937 buy kindle past time one come defective port b... 2

# Splitting data into training set and validation


X_train, X_test, y_train, y_test = train_test_split(df['Processed_Review'], df
test_size=0.1, random_state=

top_words = 20000
maxlen = 100
batch_size = 32
nb_classes = 3
nb_epoch = 3

from tensorflow.keras.preprocessing.text import Tokenizer


tokenizer = Tokenizer(nb_words=top_words)
tokenizer.fit_on_texts(X_train)

sequences_train = tokenizer.texts_to_sequences(X_train)
sequences_test = tokenizer.texts_to_sequences(X_test)

X_train_seq = sequence.pad_sequences(sequences_train, maxlen=maxlen)


X_test_seq = sequence.pad_sequences(sequences_test, maxlen=maxlen)

# One-Hot Encoding of y_train and y_test


y_train_seq = tf.keras.utils.to_categorical(y_train nb_classes)

14 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

y_train_seq = tf.keras.utils.to_categorical(y_train, nb_classes)


y_test_seq = tf.keras.utils.to_categorical(y_test, nb_classes)

print('X_train shape:', X_train_seq.shape) #(27799, 100)


print('X_test shape:', X_test_seq.shape) #(3089, 100)
print('y_train shape:', y_train_seq.shape) #(27799, 2)
print('y_test shape:', y_test_seq.shape) #(3089, 2)
X_train shape: (1012, 100)
X_test shape: (113, 100)
y_train shape: (1012, 3)
y_test shape: (113, 3)
/usr/local/lib/python3.10/dist-packages/keras/src/preprocessing/text.py:246: User
warnings.warn(

# Constructing a Simple LSTM


model1 = Sequential()
model1.add(tf.keras.layers.Embedding(top_words, 128))
model1.add(tf.keras.layers.LSTM(128,dropout=0.2))
model1.add(tf.keras.layers.Dense(nb_classes))
model1.add(tf.keras.layers.Activation('softmax'))
model1.summary()

# Compiling LSTM
model1.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])

model1.fit(X_train_seq, y_train_seq, batch_size=batch_size, epochs=nb_epoch, verbose=

# Model Evaluation
score = model1.evaluate(X_test_seq, y_test_seq, batch_size=batch_size)
print('Test loss : {:.4f}'.format(score[0]))
print('Test accuracy : {:.4f}'.format(score[1]))
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_3 (Embedding) (None, None, 128) 2560000

lstm_1 (LSTM) (None, 128) 131584

dense_1 (Dense) (None, 3) 387

activation_1 (Activation) (None, 3) 0

=================================================================
Total params: 2691971 (10.27 MB)
Trainable params: 2691971 (10.27 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/3
32/32 [==============================] - 14s 293ms/step - loss: 0.6464 - accuracy
Epoch 2/3
32/32 [==============================] - 7s 204ms/step - loss: 0.5352 - accuracy:
Epoch 3/3
32/32 [==============================] - 8s 258ms/step - loss: 0.3841 - accuracy:

15 of 16 15/12/23, 01:14
Untitled28.ipynb - Colaboratory https://colab.research.google.com/drive/1sAgmbXSp...

32/32 [==============================] - 8s 258ms/step - loss: 0.3841 - accuracy:


4/4 [==============================] - 1s 43ms/step - loss: 0.3329 - accuracy: 0.
Test loss : 0.3329
Test accuracy : 0.8053

# Getting weight matrix of the embedding layer


model1.layers[0].get_weights()[0] # weight matrix of the embedding layer, word-by-di
print("Size of weight matrix in the embedding layer : ", \
model1.layers[0].get_weights()[0].shape) #(20000, 128)

# Getting weight matrix of the hidden layer


print("Size of weight matrix in the hidden layer : ", \
model1.layers[1].get_weights()[0].shape) #(128, 512) weight dim of LSTM - w

# Getting weight matrix of the output layer


print("Size of weight matrix in the output layer : ", \
model1.layers[2].get_weights()[0].shape) #(128, 2) weight dim of dense layer

Size of weight matrix in the embedding layer : (20000, 128)


Size of weight matrix in the hidden layer : (128, 512)
Size of weight matrix in the output layer : (128, 3)

16 of 16 15/12/23, 01:14

You might also like