0% found this document useful (0 votes)

18 views14 pages

تمثيل النص كموترات - تدريب - مايكروسوفت ليرن

Uploaded by

qasorh6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views14 pages

تمثيل النص كموترات - تدريب - مايكروسوفت ليرن

Uploaded by

qasorh6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Ｔ‫مقبل‬ Ｓ 7 ‫ من‬2 ‫الوحدة‬ ‫سابق‬Ｒ

＂ ‫ إكس بي‬100

‫تمثيل النص كموترات‬

‫ دقائق‬10

‫تحقق من حسابك‬

Something went wrong. Please reload the

challenge to try again.

Reload Challenge
15317ed3dbf6996a2.4789888504
‫منظر‬ ‫حرر‬ ‫ملف‬ ‫وقت التشغيل‬ ‫ التعليقات‬

 ‫ تشغيل الكل‬ ‫ النواة‬   ‫الحوسبة غير متصلة‬

    

Representing text
If we want to solve Natural Language Processing (NLP) tasks with neural networks, we need some way to represent text as tensors.
Computers already represent characters as numbers that map to letters on your screen using encodings such as ASCII or UTF-8.

Image showing diagram mapping a character to an ASCII and binary representation

We understand what each letter represents, and how all characters come together to form the words of a sentence. However,
computers don't have such an understanding, and neural networks have to learn the meaning of the sentence during training.

We can use different approaches when representing text:

Character-level representation, where we represent text by treating each character as a number. Given that we have C
different characters in our text corpus, the word Hello could be represented by a tensor with shape C × 5. Each letter would
correspond to a tensor in one-hot encoding.
Word-level representation, in which we create a vocabulary of all words in our text, and then represent words using one-hot
encoding. This approach is better than character-level representation because each letter by itself does not have much meaning.
By using higher-level semantic concepts - words - we simplify the task for the neural network. However, given a large dictionary
size, we need to deal with high-dimensional sparse tensors.

Let's start by installing some required Python packages we'll use in this module.

 ]1[
Text classification task
In this module, we will start with a simple text classification task based on the AG_NEWS dataset: we'll classify news headlines into
one of 4 categories: World, Sports, Business and Sci/Tech. To load the dataset, we will use the TensorFlow Datasets API.

“
In the sandbox environment, we need to pre-fetch the dataset from a known location before creating it with TensorFlow
datasets. If you're running in your local environment, you can skip the next cell, and the TensorFlow datasets library will
download the data automatically.

 ]2[

 ]3[

We can now access the training and test portions of the dataset by using dataset['train'] and dataset['test'] respectively:
 ]4[

120000
7600

Let's print out the first 10 new headlines from our dataset:

 ]5[

b'AMD Debuts Dual-Core Opteron Processor' b'AMD #39;s new dual-core Opteron chip is >- )Sci/Tech( 3
designed mainly for corporate computing applications, including databases, Web services, and financial
'.transactions
b"Wood's Suspension Upheld (Reuters)" b'Reuters - Major League Baseball\\Monday announced a >- )Sports( 1
decision on the appeal filed by Chicago Cubs\\pitcher Kerry Wood regarding a suspension stemming from
'.an\\incident earlier this season
b'Bush reform may have blue states seeing red' b'President Bush #39;s quot;revenue- >- )Business( 2
neutral quot; tax reform needs losers to balance its winners, and people claiming the federal deduction
'.for state and local taxes may be in administration planners #39; sights, news reports say
b"'Halt science decline in schools'" b'Britain will run out of leading scientists unless >- )Sci/Tech( 3
'.science education is improved, says Professor Colin Pillinger
b'Gerrard leaves practice' b'London, England (Sports Network) - England midfielder Steven >- )Sports( 1
Gerrard injured his groin late in Thursday #39;s training session, but is hopeful he will be ready for
'.Saturday #39;s World Cup qualifier against Austria
Text vectorization
Now we need to convert text into numbers that can be represented as tensors. If we want word-level representation, we need to do
two things:

Use a tokenizer to split text into tokens.

Build a vocabulary of those tokens.

Limiting vocabulary size

In the AG News dataset example, the vocabulary size is rather big, more than 100k words. Generally speaking, we don't need words
that are rarely present in the text — only a few sentences will have them, and the model will not learn from them. Thus, it makes
sense to limit the vocabulary size to a smaller number by passing an argument to the vectorizer constructor:

Both of those steps can be handled using the TextVectorization layer. Let's instantiate the vectorizer object, and then call the
adapt method to go through all text and build a vocabulary:

 ]16[

“
Note that we are using only subset of the whole dataset to build a vocabulary. We do it to speed up the execution time and not
keep you waiting. However, we are taking the risk that some of the words from the whole dateset would not be included into the
vocabulary, and will be ignored during training. Thus, using the whole vocabulary size and running through all dataset during
adapt should increase the final accuracy, but not significantly.

Now we can access the actual vocabulary:

 ]17[

]'the', 'to', 'a', 'of', 'in', 'and', 'on', 'for' ,']UNK[' ,''[
Length of vocabulary: 50000

Using the tokenizer, we can easily encode any text into a set of numbers:

 ]18[

<tf.Tensor: shape=(7,), dtype=int64, numpy=array([ 372, 2297, 3, 312, 12, 1293, 2314])>

Bag-of-words text representation

Because words represent meaning, sometimes we can figure out the meaning of a piece of text by just looking at the individual
words, regardless of their order in the sentence. For example, when classifying news, words like weather and snow are likely to
indicate weather forecast, while words like stocks and dollar would count towards financial news.

Bag-of-words (BoW) vector representation is the most simple to understand traditional vector representation. Each word is linked to
a vector index, and a vector element contains the number of occurrences of each word in a given document.

Image showing how a bag of words vector representation is represented in memory.

“
Note: You can also think of BoW as a sum of all one-hot-encoded vectors for individual words in the text.

Below is an example of how to generate a bag-of-words representation using the Scikit Learn python library:
 ]19[

array([[1, 1, 0, 2, 0, 0, 0, 0, 0]])

We can also use the Keras vectorizer that we defined above, converting each word number into a one-hot encoding and adding all
those vectors up:

 ]20[

array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)

“
Note: You may be surprised that the result differs from the previous example. The reason is that in the Keras example the length
of the vector corresponds to the vocabulary size, which was built from the whole AG News dataset, while in the Scikit Learn
example we built the vocabulary from the sample text on the fly.
Training the BoW classifier
Now that we have learned how to build the bag-of-words representation of our text, let's train a classifier that uses it. First, we need
to convert our dataset to a bag-of-words representation. This can be achieved by using map function in the following way:

 ]21[

Now let's define a simple classifier neural network that contains one linear layer. The input size is vocab_size , and the output size
corresponds to the number of classes (4). Because we're solving a classification task, the final activation function is softmax:

 ]22[

88s 94ms/step - loss: 0.5466 - acc: 0.8759 - val_loss: 0.3682 - - ]==============================[ 938/938
val_acc: 0.89

<tensorflow.python.keras.callbacks.History at 0x7fb00217a810>
 
Since we have 4 classes, an accuracy of above 80% is a good result.

Training a classifier as one network

Because the vectorizer is also a Keras layer, we can define a network that includes it, and train it end-to-end. This way we don't need
to vectorize the dataset using map , we can just pass the original dataset to the input of the network.

“
Note: We would still have to apply maps to our dataset to convert fields from dictionaries (such as title , description and
label ) to tuples. However, when loading data from disk, we can build a dataset with the required structure in the first place.

 ]23[

"Model: "functional_1
_________________________________________________________________
# Layer (type) Output Shape Param
=================================================================
input_1 (InputLayer) [(None, 1)] 0
_________________________________________________________________
text_vectorization_6 (TextVe (None, None) 0
_________________________________________________________________
tf_op_layer_OneHot (TensorFl [(None, None, 50000)] 0
_________________________________________________________________
tf_op_layer_Sum (TensorFlowO [(None, 50000)] 0
_________________________________________________________________
dense_1 (Dense) (None, 4) 200004
=================================================================
Total params: 200,004
Trainable params: 200,004
Non-trainable params: 0
_________________________________________________________________
79s 84ms/step - loss: 0.5221 - acc: 0.8804 - val_loss: 0.3447 - - ]==============================[ 938/938
val_acc: 0.90

<tensorflow.python.keras.callbacks.History at 0x7fb003184d10>
 

Bigrams, trigrams and n-grams

One limitation of the bag-of-words approach is that some words are part of multi-word expressions, for example, the word 'hot dog'
has a completely different meaning from the words 'hot' and 'dog' in other contexts. If we represent the words 'hot' and 'dog' always
using the same vectors, it can confuse our model.

To address this, n-gram representations are often used in methods of document classification, where the frequency of each word,
bi-word or tri-word is a useful feature for training classifiers. In bigram representations, for example, we will add all word pairs to the
vocabulary, in addition to original words.

Below is an example of how to generate a bigram bag of word representation using Scikit Learn:
 ]24[

:Vocabulary
i': 7, 'like': 11, 'hot': 4, 'dogs': 2, 'i like': 8, 'like hot': 12, 'hot dogs': 5, 'the': 16, 'dog':'{
0, 'ran': 14, 'fast': 3, 'the dog': 17, 'dog ran': 1, 'ran fast': 15, 'its': 9, 'outside': 13, 'its hot':
}10, 'hot outside': 6

array([[1, 0, 1, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
 

The main drawback of the n-gram approach is that the vocabulary size starts to grow extremely fast. In practice, we need to combine
the n-gram representation with a dimensionality reduction technique, such as embeddings, which we will discuss in the next unit.

To use an n-gram representation in our AG News dataset, we need to pass the ngrams parameter to our TextVectorization
constructor. The length of a bigram vocaculary is significantly larger, in our case it is more than 1.3 million tokens! Thus it makes
sense to limit bigram tokens as well by some reasonable number.

We could use the same code as above to train the classifier, however, it would be very memory-inefficient. In the next unit, we will
train the bigram classifier using embeddings. In the meantime, you can experiment with bigram classifier training in this notebook
and see if you can get higher accuracy.
Automatically calculating BoW Vectors
In the example above we calculated BoW vectors by hand by summing the one-hot encodings of individual words. However, the
latest version of TensorFlow allows us to calculate BoW vectors automatically by passing the output_mode='count parameter to
the vectorizer constructor. This makes defining and training our model significanly easier:

 ]25[

Training vectorizer
10s 11ms/step - loss: 0.5207 - acc: 0.8826 - val_loss: 0.3430 - - ]==============================[ 938/938
val_acc: 0.90

<tensorflow.python.keras.callbacks.History at 0x7fb002c0b290>
 
Term frequency - inverse document frequency (TF-IDF)
In BoW representation, word occurrences are weighted using the same technique regardless of the word itself. However, it's clear
that frequent words such as a and in are much less important for classification than specialized terms. In most NLP tasks some words
are more relevant than others.

TF-IDF stands for term frequency - inverse document frequency. It's a variation of bag-of-words, where instead of a binary 0/1
value indicating the appearance of a word in a document, a floating-point value is used, which is related to the frequency of the word
occurrence in the corpus.

More formally, the weight wij of a word i in the document j is defined as:

N
wij = tfij × log(
)

dfi

where

tfij is the number of occurrences of i in j , i.e. the BoW value we have seen before

N is the number of documents in the collection

dfi is the number of documents containing the word i in the whole collection ]20[


The TF-IDF value wij increases proportionally to the

number of times a word
array([[0.43381609, 0. appears in
, a0.43381609,
document and0.
is offset by ,the0.65985664,
number of
documents in the corpus that contains the word, which helps to adjust for 0.
0.43381609, the fact that ,some
0. words appear
, 0. more frequently
, 0. than ,
others. For example, if the word appears in every document0. , 0. dfi = N ,, and
in the collection,
0. wij = 0, ,and

0.those terms
, would
0. be ,
completely disregarded. 0. ]])

You can easily create TF-IDF vectorization of text using Scikit Learn:

In Keras, the TextVectorization layer can automatically compute TF-IDF frequencies by passing the output_mode='tf-idf'
parameter. Let's repeat the code we used above to see if using TF-IDF increases accuracy:
 ]21[

Training vectorizer
94s 101ms/step - loss: 0.3203 - acc: 0.9039 - val_loss: 0.2542 - ]==============================[ 938/938
- val_acc: 0.918

<tensorflow.python.keras.callbacks.History at 0x7f78f402e5d0>
 

Conclusion
Even though TF-IDF representations provide frequency weights to different words, they are unable to represent meaning or order. As
the famous linguist J. R. Firth said in 1935, "The complete meaning of a word is always contextual, and no study of meaning apart
from context can be taken seriously." We will learn how to capture contextual information from text using language modeling in a
later unit.

‫لا حوسبة‬ ‫الحوسبة غير متصلة‬ ‫عرض‬ ‫النواة غير متصلة‬

‫ تمثيل الكلمات مع التضمين‬:‫الوحدة التالية‬

‫ استمر‬Ｔ

NLP Assignment 2
No ratings yet
NLP Assignment 2
3 pages
Transform Raw Texts Into Training and Development Data: Instructor: Nikos Aletras
No ratings yet
Transform Raw Texts Into Training and Development Data: Instructor: Nikos Aletras
2 pages
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
No ratings yet
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
14 pages
Glove
100% (1)
Glove
10 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
Natural Language Processing With Neural Network - Class3
No ratings yet
Natural Language Processing With Neural Network - Class3
25 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
DL Practical 09text Pre Processing
No ratings yet
DL Practical 09text Pre Processing
6 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
CNN Text Classification
No ratings yet
CNN Text Classification
12 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
59 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
58 pages
Machine Learning For NLP: Vocabulary
No ratings yet
Machine Learning For NLP: Vocabulary
37 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Doga
No ratings yet
Doga
13 pages
Unit IV
No ratings yet
Unit IV
58 pages
W03 NLP
No ratings yet
W03 NLP
88 pages
NLP Concepts
No ratings yet
NLP Concepts
37 pages
L4 Cse256 Fa24 We
No ratings yet
L4 Cse256 Fa24 We
68 pages
04 - Text Representation
No ratings yet
04 - Text Representation
131 pages
TextFeatureEnginerring-NLP Lec2
No ratings yet
TextFeatureEnginerring-NLP Lec2
60 pages
Character-Level Convolutional Networks For Text Classification
No ratings yet
Character-Level Convolutional Networks For Text Classification
9 pages
Zhou 2020
No ratings yet
Zhou 2020
5 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
Next Word Prediction With NLP and Deep Learning
No ratings yet
Next Word Prediction With NLP and Deep Learning
13 pages
Lect 04
No ratings yet
Lect 04
44 pages
CISC 867 Deep Learning: 14. Text Classification With Recurrent Neural Networks and Word Embeddings
No ratings yet
CISC 867 Deep Learning: 14. Text Classification With Recurrent Neural Networks and Word Embeddings
28 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
Lesson 1 Intro
No ratings yet
Lesson 1 Intro
51 pages
Unit IV
No ratings yet
Unit IV
57 pages
Text Classification With Transformer - 1716327784332
No ratings yet
Text Classification With Transformer - 1716327784332
3 pages
Unit Ii
No ratings yet
Unit Ii
20 pages
Exercise 8
No ratings yet
Exercise 8
6 pages
Unit III
No ratings yet
Unit III
28 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
DL - 20-WordEmbeddings - Ipynb - Colab
No ratings yet
DL - 20-WordEmbeddings - Ipynb - Colab
6 pages
Llms Course Andrew
No ratings yet
Llms Course Andrew
46 pages
Coursera-Deep-Learning Sequence Models Emojify
No ratings yet
Coursera-Deep-Learning Sequence Models Emojify
35 pages
Lecture 6 - Word2Vec and Text Classification
No ratings yet
Lecture 6 - Word2Vec and Text Classification
66 pages
Mathematics of LLMs Part 1
No ratings yet
Mathematics of LLMs Part 1
8 pages
Tensor Flow Chat Bot
No ratings yet
Tensor Flow Chat Bot
44 pages
Ba LLMS W2 S2 2024 2025
No ratings yet
Ba LLMS W2 S2 2024 2025
47 pages
NLP - Module 2
No ratings yet
NLP - Module 2
54 pages
Expt 5 Expt 6
No ratings yet
Expt 5 Expt 6
10 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Image Captioning With Visual Attention PDF
No ratings yet
Image Captioning With Visual Attention PDF
16 pages
Tugas NLP - 1152000052 1
No ratings yet
Tugas NLP - 1152000052 1
14 pages
Sheet 3
No ratings yet
Sheet 3
5 pages
Transformers Torch
No ratings yet
Transformers Torch
38 pages
Microproject Report
No ratings yet
Microproject Report
23 pages
cl12 Huggingface
No ratings yet
cl12 Huggingface
34 pages
CS4740/5740 Introduction To NLP Fall 2017 Neural Language Models and Classifiers
No ratings yet
CS4740/5740 Introduction To NLP Fall 2017 Neural Language Models and Classifiers
7 pages
DLT Experiment 2
No ratings yet
DLT Experiment 2
7 pages
MLA TAB Lecture2
No ratings yet
MLA TAB Lecture2
84 pages
Cs224n 2025 Lecture03 Neuralnets
No ratings yet
Cs224n 2025 Lecture03 Neuralnets
96 pages
Code Explanation
No ratings yet
Code Explanation
8 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
الطريقة الكاملة لتهيئة الـ HardDisk
No ratings yet
الطريقة الكاملة لتهيئة الـ HardDisk
4 pages
Win Secrets
No ratings yet
Win Secrets
2 pages
Jacobs W.W. - The Monkey's Paw-Pages-44
No ratings yet
Jacobs W.W. - The Monkey's Paw-Pages-44
1 page
Hyper Terminal
No ratings yet
Hyper Terminal
2 pages
Jacobs W.W. - The Monkey's Paw-Pages-46
No ratings yet
Jacobs W.W. - The Monkey's Paw-Pages-46
1 page
Jacobs W.W. - The Monkey's Paw-Pages-47
No ratings yet
Jacobs W.W. - The Monkey's Paw-Pages-47
1 page
Jacobs W.W. - The Monkey's Paw-Pages-1
No ratings yet
Jacobs W.W. - The Monkey's Paw-Pages-1
1 page
Jacobs W.W. - The Monkey's Paw-Pages-48
No ratings yet
Jacobs W.W. - The Monkey's Paw-Pages-48
1 page
Jacobs W.W. - The Monkey's Paw-Pages-45
No ratings yet
Jacobs W.W. - The Monkey's Paw-Pages-45
1 page
Jacobs W.W. - The Monkey's Paw-Pages-4
No ratings yet
Jacobs W.W. - The Monkey's Paw-Pages-4
1 page
Jacobs W.W. - The Monkey's Paw-Pages-2
No ratings yet
Jacobs W.W. - The Monkey's Paw-Pages-2
1 page
Jacobs W.W. - The Monkey's Paw-Pages-5
No ratings yet
Jacobs W.W. - The Monkey's Paw-Pages-5
1 page
Jacobs W.W. - The Monkey's Paw-Pages-3
No ratings yet
Jacobs W.W. - The Monkey's Paw-Pages-3
1 page
Day - Coffee
No ratings yet
Day - Coffee
5 pages
Dissecting Reinforcement Learning-Part6
No ratings yet
Dissecting Reinforcement Learning-Part6
25 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Introduction To Cross Entropy Loss
No ratings yet
Introduction To Cross Entropy Loss
13 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Skin - Cancer - Report Final
No ratings yet
Skin - Cancer - Report Final
26 pages
sp20 Midterm Solutions
No ratings yet
sp20 Midterm Solutions
12 pages
Intern Assignment
No ratings yet
Intern Assignment
3 pages
Attention in Natural Language Processing: Andrea Galassi, Marco Lippi, and Paolo Torroni
No ratings yet
Attention in Natural Language Processing: Andrea Galassi, Marco Lippi, and Paolo Torroni
18 pages
SSCLNet A Self-Supervised Contrastive Loss-Based Pre-Trained Network For Brain MRI Classification
No ratings yet
SSCLNet A Self-Supervised Contrastive Loss-Based Pre-Trained Network For Brain MRI Classification
9 pages
Assignment 3
No ratings yet
Assignment 3
25 pages
Activation Functions Book
No ratings yet
Activation Functions Book
20 pages
Digital Implementation of The Softmax Activation Function and The Inverse Softmax Function
No ratings yet
Digital Implementation of The Softmax Activation Function and The Inverse Softmax Function
4 pages
A Sensitivity Analysis of Convolutional Neural Networks For Sentence Classification
No ratings yet
A Sensitivity Analysis of Convolutional Neural Networks For Sentence Classification
18 pages
Linguistically-Informed Self-Attention For Semantic Role Labeling
No ratings yet
Linguistically-Informed Self-Attention For Semantic Role Labeling
10 pages
INT305 Machine Learning Linear Methods For Regression, Optimization
No ratings yet
INT305 Machine Learning Linear Methods For Regression, Optimization
50 pages
Carlini and Wagner 2017 Towards - Evaluating - The - Robustness - of - Neural - Networks
No ratings yet
Carlini and Wagner 2017 Towards - Evaluating - The - Robustness - of - Neural - Networks
19 pages
Crash Course On Tensorflow!: Vincent Lepetit!
No ratings yet
Crash Course On Tensorflow!: Vincent Lepetit!
63 pages
Sec24summer Prepub 346 He
No ratings yet
Sec24summer Prepub 346 He
18 pages
cs224n Lecture Notes
No ratings yet
cs224n Lecture Notes
35 pages
An Intelligent IoT Sensing System For Rail Vehicle Running States Based On TinyML
No ratings yet
An Intelligent IoT Sensing System For Rail Vehicle Running States Based On TinyML
12 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Prediction of Airline Ticket Price: Motivation Models Diagnostics
No ratings yet
Prediction of Airline Ticket Price: Motivation Models Diagnostics
1 page
Convolutional Neural Networks: Jianxin Wu
No ratings yet
Convolutional Neural Networks: Jianxin Wu
35 pages
Evaluating The Effect of Volatile Federated Timeseries On Modern DNNs Attention Over Long-Short Memory
No ratings yet
Evaluating The Effect of Volatile Federated Timeseries On Modern DNNs Attention Over Long-Short Memory
6 pages
ANNtoSNN PDF
No ratings yet
ANNtoSNN PDF
12 pages
Transformers and Pretrained Language Models
No ratings yet
Transformers and Pretrained Language Models
18 pages
What Is Candidate Sampling: X ,) T X T L
No ratings yet
What Is Candidate Sampling: X ,) T X T L
9 pages
Deep Learning For Vision Lab Manual 2024
100% (1)
Deep Learning For Vision Lab Manual 2024
25 pages
Hardware Implementation of Softmax Function Based On Piecewise LUT
No ratings yet
Hardware Implementation of Softmax Function Based On Piecewise LUT
3 pages

تمثيل النص كموترات - تدريب - مايكروسوفت ليرن

Uploaded by

تمثيل النص كموترات - تدريب - مايكروسوفت ليرن

Uploaded by

Ｔ‫مقبل‬ Ｓ 7 ‫ من‬2 ‫الوحدة‬ ‫سابق‬Ｒ

‫تمثيل النص كموترات‬

Something went wrong. Please reload the

 ‫ تشغيل الكل‬ ‫ النواة‬   ‫الحوسبة غير متصلة‬

Image showing diagram mapping a character to an ASCII and binary representation

We can use different approaches when representing text:

Use a tokenizer to split text into tokens.

Limiting vocabulary size

Now we can access the actual vocabulary:

Bag-of-words text representation

Image showing how a bag of words vector representation is represented in memory.

array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)

Training a classifier as one network

Bigrams, trigrams and n-grams

N is the number of documents in the collection

The TF-IDF value wij increases proportionally to the

‫لا حوسبة‬ ‫الحوسبة غير متصلة‬ ‫عرض‬ ‫النواة غير متصلة‬

‫ تمثيل الكلمات مع التضمين‬:‫الوحدة التالية‬

You might also like