vertopal.com_C1_W1_Assignment
vertopal.com_C1_W1_Assignment
Welcome to week one of this specialization. You will learn about logistic regression. Concretely,
you will be implementing logistic regression for sentiment analysis on tweets. Given a tweet,
you will decide if it has a positive sentiment or a negative one. Specifically you will:
• Learn how to extract features for logistic regression given some text
• Implement logistic regression from scratch
• Apply logistic regression on a natural language processing task
• Test using your logistic regression
• Perform error analysis
1. You have not added any extra print statement(s) in the assignment.
2. You have not added any extra code cell(s) in the assignment.
3. You have not changed any of the function parameters.
4. You are not using any global variables inside your graded exercises. Unless specifically
instructed to do so, please refrain from it and use the local variables instead.
5. You are not changing the assignment code where it is not required, like creating extra
variables.
If you do any of the following, you will get something like, Grader not found (or similarly
unexpected) error upon submitting your assignment. Before asking for help/debugging the
errors in your assignment, check for these first. If this is the case, and you don't remember the
changes you have made, you can get a fresh copy of the assignment by following these
instructions.
We will be using a data set of tweets. Hopefully you will get more than 99% accuracy.
Run the cell below to load in the packages.
nltk.download('twitter_samples')
nltk.download('stopwords')
[nltk_data] Downloading package twitter_samples to
[nltk_data] /home/jovyan/nltk_data...
[nltk_data] Unzipping corpora/twitter_samples.zip.
[nltk_data] Downloading package stopwords to /home/jovyan/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
True
Imported functions
Download the data needed for this assignment. Check out the documentation for the
twitter_samples dataset.
• twitter_samples: if you're running this notebook on your local computer, you will need to
download it using:
nltk.download('twitter_samples')
• stopwords: if you're running this notebook on your local computer, you will need to
download it using:
nltk.download('stopwords')
import numpy as np
import pandas as pd
from nltk.corpus import twitter_samples
– You will select just the five thousand positive tweets and five thousand negative
tweets.
# select the set of positive and negative tweets
all_positive_tweets = twitter_samples.strings('positive_tweets.json')
all_negative_tweets = twitter_samples.strings('negative_tweets.json')
• Train test split: 20% will be in the test set, and 80% in the training set.
# split the data into two pieces, one for training and one for testing
(validation set)
test_pos = all_positive_tweets[4000:]
train_pos = all_positive_tweets[:4000]
test_neg = all_negative_tweets[4000:]
train_neg = all_negative_tweets[:4000]
train_y.shape = (8000, 1)
test_y.shape = (2000, 1)
• Notice how the outer for loop goes through each tweet, and the inner for loop steps
through each word in a tweet.
• The 'freqs' dictionary is the frequency dictionary that's being built.
• The key is the tuple (word, label), such as ("happy",1) or ("happy",0). The value stored for
each key is the count of how many times the word "happy" was associated with a positive
label, or how many times "happy" was associated with a negative label.
# create frequency dictionary
freqs = build_freqs(train_x, train_y)
Expected output
type(freqs) = <class 'dict'>
len(freqs) = 11436
Process tweet
The given function 'process_tweet' tokenizes the tweet into individual words, removes stop
words and applies stemming.
Expected output
This is an example of a positive tweet:
#FollowFriday @France_Inte @PKuchly57 @Milipol_Paris for being top
engaged members in my community this week :)
return h
if (sigmoid(4.92) == 0.9927537604041685):
print('CORRECT!')
else:
print('Oops again!')
SUCCESS!
CORRECT!
Regression:
z=θ 0 x 0 +θ1 x1 +θ 2 x 2+ .. . θ N x N
Note that the θ values are "weights". If you took the deep learning specialization, we referred to
the weights with the 'w' vector. In this course, we're using a different variable θ to refer to the
weights.
Logistic regression
1
h ( z )= −z
1+exp
z=θ 0 x 0 +θ1 x1 +θ 2 x 2+ .. . θ N x N
• All the h values are between 0 and 1, so the logs will be negative. That is the reason for
the factor of -1 applied to the sum of the two loss terms.
• Note that when the model predicts 1 (h ( z ( θ ) )=1) and the label 'y' is also 1, the loss for
that training example is 0.
• Similarly, when the model predicts 0 (h ( z ( θ ) )=0) and the actual label is also 0, the loss
for that training example is 0.
• However, when the model prediction is close to 1 (h ( z ( θ ) )=0.9999) and the label is 0, the
second term of the log loss becomes a large negative number, which is then multiplied
by the overall factor of -1 to convert it to a positive loss value.
−1 × ( 1 −0 ) × lo g ( 1 −0.9999 ) ≈ 9.2 The closer the model prediction gets to 1, the larger
the loss.
# verify that when the model predicts close to 1, but the actual label
is 0, the loss is a large positive value
-1 * (1 - 0) * np.log(1 - 0.9999) # loss is about 9.2
9.210340371976294
• Likewise, if the model predicts close to 0 (h ( z )=0.0001) but the actual label is 1, the first
term in the loss function becomes a large number: −1 ×l o g ( 0.0001 ) ≈ 9.2. The closer the
prediction is to zero, the larger the loss.
# verify that when the model predicts close to 0 but the actual label
is 1, the loss is a large positive value
-1 * np.log(0.0001) # loss is about 9.2
9.210340371976182
• The learning rate α is a value that we choose to control how big a single update will
be.
()
θ0
θ1
θ= θ 2
⋮
θn
• θ has dimensions (n+1, 1), where 'n' is the number of features, and there is one more
element for the bias term θ0 (note that the corresponding feature value x 0 is 1).
• The 'logits', 'z', are calculated by multiplying the feature matrix 'x' with the weight vector
'theta'. z=x θ
– x has dimensions (m, n+1)
– θ : has dimensions (n+1, 1)
– z : has dimensions (m, 1)
• The prediction 'h', is calculated by applying the sigmoid to each element in 'z':
h ( z )=s i g m o i d ( z ), and has dimensions (m,1).
• The cost function J is calculated by taking the dot product of the vectors 'y' and 'log(h)'.
Since both 'y' and 'h' are column vectors (m,1), transpose the vector to the left, so that
matrix multiplication of a row vector with column vector performs the dot product.
−1
× ( y ⋅l o g ( h ) + ( 1 − y ) ⋅l o g ( 1− h ) )
T T
J=
m
• The update of theta is also vectorized. Because the dimensions of x are (m, n+1), and
both h and y are (m, 1), we need to transpose the x and place it on the left in order to
perform matrix multiplication, which then yields the (n+1, 1) answer we need:
α
×( x ⋅(h − y ))
T
θ=θ −
m
# UNQ_C2 GRADED FUNCTION: gradientDescent
def gradientDescent(x, y, theta, alpha, num_iters):
'''
Input:
x: matrix of features which is (m,n+1)
y: corresponding labels of the input matrix x, dimensions
(m,1)
theta: weight vector of dimension (n+1,1)
alpha: learning rate
num_iters: number of iterations you want to train your model
for
Output:
J: the final cost
theta: your final weight vector
Hint: you might want to print the cost to make sure that it is
going down.
'''
### START CODE HERE ###
# get 'm', the number of rows in matrix x
m = x.shape[0];
import numpy as np
# Check the function
# Construct a synthetic test case using numpy PRNG functions
np.random.seed(1)
# X input is 10 x 3 with ones for the bias terms
tmp_X = np.append(np.ones((10, 1)), np.random.rand(10, 2) * 2000,
axis=1)
# Y Labels are 10 x 1
tmp_Y = (np.random.rand(10, 1) > 0.35).astype(float)
Expected output
The cost after training is 0.67094970.
The resulting vector of weights is [4.1e-07, 0.00035658, 7.309e-05]
Expected output
[[1.000e+00 3.133e+03 6.100e+01]]
# test 2:
# check for when the words are not in the freqs dictionary
tmp2 = extract_features('blorb bleeeeb bloooob', freqs)
print(tmp2)
[[1. 0. 0.]]
Expected output
[[1. 0. 0.]]
This section is given to you. Please read it for understanding and run the cell.
# collect the features 'x' and stack them into a matrix 'X'
X = np.zeros((len(train_x), 3))
for i in range(len(train_x)):
X[i, :]= extract_features(train_x[i], freqs)
y p r e d =s i g mo i d ( x ⋅θ )
return y_pred
Expected Output:
array([[0.83110307]])
return accuracy
Expected Output:
0.9950
Pretty good!
Later in this specialization, we will see how we can use deeplearning to improve the prediction
performance.