0% found this document useful (0 votes)

14 views26 pages

Bao Cao Du An Ket Thuc Hoc Phan Pham Quynh Nga 31221024877

The document discusses a proposed research project analyzing TikTok's reliability in removing videos that violate national standards. The project aims to collect data on takedown requests from TikTok's website, analyze the data using tools like Excel and Orange, and assess countries' reliability levels based on factors like the number of requests and handled/unhandled content. The theoretical basis covers data mining techniques, machine learning algorithms, and data visualization. The results will draw conclusions on the reliability of different countries' handling of community standard violations on TikTok.

Uploaded by

Vy Tường

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views26 pages

Bao Cao Du An Ket Thuc Hoc Phan Pham Quynh Nga 31221024877

Uploaded by

Vy Tường

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

MINISTRY OF EDUCATION AND TRAINING

UNIVERSITY OF ECONOMIC HO CHI MINH CITY

PROJECT

SUBJECT: DATA SCIENCE

Topic: Analyzing tiktok's reliability in requesting removal of videos

that violate national government standards

Instructor: Thai Kim Phung

Group of students:
Students Student ID
Nguyen Hoang Huy 31221021995
Pham Quynh Nga-Leader 31221024877
Vu Tuong Vy 31221021239
Vu Thi Hong Tuoi 31221023878
Vo Tran Bao Tran 31221024798

Class of subject: 23D1INF50905929

Ho Chi Minh City, March 20, 2023

Catalog
1. Project introduction........................................................................................1
1.1 Reasons for choosing topic........................................................................1
1.2 Research goals...........................................................................................2
1.3 Methods of implementation.......................................................................2
1.4 Research object..........................................................................................2
2. Theoretical basis..............................................................................................3
2.1 Data mining................................................................................................3
2.2 Machine learning.......................................................................................3
2.3 Data visualization.......................................................................................5
3. Proposed research model................................................................................6
3.1 Description of data.....................................................................................6
3.2 Processing of data......................................................................................9
3.3 Data visualization.....................................................................................10
4. Performance results.......................................................................................17
4.1 Analysis of results based on software......................................................17
4.2 Prediction data results..............................................................................20
4.3 Evaluation of results and models.............................................................22
5. Conclusions and General Comments...........................................................23
6. Thank you.......................................................................................................24
7. References.......................................................................................................24
1

1. Project introduction
1.1 Reasons for choosing the topic
In the era of science and technology, which is growing strongly and having a
certain influence on most areas of life, along with the appearance of many social
networking sites, it has had a great impact on people. use, especially by young people.
Each advancement in the field of technology sets the stage for the development of a
new form of communication. The benefits that social networks bring such as a huge
amount of information, diversity, and wealth are constantly updated, and many
utilities for entertainment and learning are undeniable, ... especially the change of a
powerful form of communication between individuals, groups, and nations. worldwide
(connected). But at the same time, social networks also bring many negative effects
such as spreading false information, content that does not match community standards,
etc.

Currently, the TikTok platform is one of the most visited and used social
networking platforms by users in the world in general and Vietnam in particular.
Although it has only begun to expand beyond the Chinese market since 2017, soon
after that, TikTok also developed in the US when it acquired the musical.ly app in
August 2018. As of the end of 2019, TikTok reached more than 1.5 billion downloads
on Google Play and the App Store globally, according to Sensor Tower. It can be seen
that the rapid and outstanding development has turned TikTok into a platform leading
the latest trends as well as having a significant influence on users' lives and
perceptions. In addition to these social networking benefits, young people are most
likely to fall into the temptation of uncensored videos with enough challenges and
trends that can affect and negatively affect their health. , the psychology of the user.
Therefore, it is necessary to have strict management and censorship of the content and
activities of users in the community. Therefore, analyzing the reliability of the
censorship and removal of videos that violate community standards on the TikTok
platform of the governments of different countries, as well as applying knowledge of
technology and innovation to the community. Observing, thinking, and catching up
with the social networking trends of the domestic and foreign markets is essential for
students majoring in Technology. The application of technology as well as data
2

analysis and processing methods becomes easier to understand thanks to the

knowledge in the Data Science module.

As for the reliability of the countries in removing videos that violate community
standards on TikTok, the following factors are influential: total number of requests,
content handled for violating Community Standards, content is handled due to
violation of (internal) law, content is not handled, reports request to be removed…
Through these factors, it can be easier to access information and collect data. data to
conduct reliability analysis.

1.2 Research goals

Solve research problems through analysis of theoretical bases.

Methods of data processing and classification research (classification methods

make predictions, classify and also classify objects). The study will introduce data
classification methods and then select the most optimal and guaranteed method for
data forecasting.

Analyze the credibility of national governments' handling of community

standards violations on the TikTok platform.

Through the results of data analysis along with confirming the reliability level
through the indicators, thereby drawing conclusions, and limitations of the research
paper, then giving the best solution for the problem research.

1.3 Methods of implementation

Collect information and data on TikTok's official website through the
Government's report on deletion requests.

Use Excel tool, Orange data mining tool to process data, represent data as well as
compare models.

1.4 Research object

Reports of total content, takedown requests and accounts reported in violation of
the TikTok Community Standards in 43 countries.
3

2. Theoretical basis
2.1 Data mining
Data mining is the process of finding and analyzing patterns in data to find useful
and potentially useful information. It is a part of data science and is commonly used in
fields such as business, health, social sciences, and others.

Data mining involves using tools and techniques to analyze large, complex, and
unstructured data. These techniques may include cluster analysis, correlation analysis,
time series analysis, machine learning, and text mining. The results of this process can
be used to create predictions, discover relationships, find new knowledge, and support
decisions in business and other areas.

Data mining process:

Orange is known for integrating data mining and machine learning tools. Orange
is written in Python language, providing interactive visuals and aesthetics for users.
Orange provides users with tools to perform data analysis algorithms such as principal
component analysis (PCA), cluster analysis, independent component analysis (ICA),
regression, and support vector machines. (SVM) and many other algorithms.

2.2 Machine learning

A method in data science that allows computers to learn from data automatically
and generate predictive or classification models.

Instead of programming explicit rules to solve a problem, machine learning

allows computers to learn from data and create models based on patterns, patterns, and
latent information in the data.
4

It is an important tool in data science and is widely used in many fields such as
economics, finance, healthcare, marketing, and many others to predict trends, make
decisions. and data analysis.

Some algorithms in Machine Learning:

Linear Regression: A supervised learning method for continuous value prediction

problems based on independent variables.

Logistic Regression: A supervised learning method for the binary classification

problem.

Decision Tree: A supervised learning method for a classification or regression

problem based on building a binary tree based on decisions.

Random Forest: A supervised learning method for a classification or regression

problem based on the construction of many random decision trees.

Support Vector Machine (SVM): A supervised learning method for a classification

or regression problem based on finding the best hyperplane to divide data points.

Clustering: An unsupervised learning method for a clustering problem based on

finding similar groups in data.

Principal Component Analysis (PCA): An unsupervised learning method for data

dimensionality reduction based on finding the principal components of data.

Neural Networks (Mạng neural): A supervised or unsupervised learning method

for a classification or regression problem based on the construction of a neural network
structured by multiple layers of connections.
5

Figure 1. Machine Learning classification image

2.3 Data visualization

The process of presenting data and information using graphics or charts to help
users easily understand and analyze the data.

It helps visually shape relationships between different data and attributes, and
helps detect trends and patterns in the data.

The purpose of data visualization is to make it easier for users to understand and
extract information from data. It can help analyze data and find solutions to problems
in areas like business, science, politics, and health.

Figure 2. Examples of some types of charts used to visualize data

3. Proposed research model

3.1 Description of data
In the data columns in all spreadsheets, the column “Reliability” is the target of
the study, it indicates the reliability of Tik Tok to the governments of countries in
handling micro-content. offense in the host country. The group used 2 separate data
tables with 100% confidence. In which, the announcement table of Tik Tok on
November 29, 2022 for testing data and the table published on May 17, 2022 for
training.

Other variables include:

Variables Describe

Country Names of countries that submitted reports

on Tik Tok starting January 1, 2019

Total requests received Government requests to remove or restrict

content or accounts, including requests
with inaccurate URLs.

Total content received Valid content URLs requested, excluding

any acceptable form of an account (i.e.
URLs, UIDs, and usernames). We review
all content requests for inaccurate URLs,
duplicate requests, requests routed to
different channels, and requests submitted
with insufficient information to determine
validity.

Content actioned due to Community Valid content URLs reviewed and

Guidelines violations actioned upon for violating our
Community Guidelines.

Content actioned due to (local) law All valid content URLs reviewed and
Violations actioned upon due to a violation of local
7

law.

Content not actioned Valid content URLs reviewed and deemed

not to violate TikTok’s Community
Guidelines, Terms of Service, and/or local
law.

Total accounts received Valid account URLs and any other

acceptable form of an account (i.e. UIDs,
usernames, etc.) requested. We review all
content requests for inaccurate URLs,
duplicate requests, requests routed to
different channels, and requests submitted
with insufficient information to determine
validity.

Accounts actioned due to Community Valid account URLs reviewed and

Guidelines Violations actioned due to Community Guidelines
violation.

Accounts actioned due to (local) law Valid account URLs reviewed and
violations actioned due to a violation of local law.

Accounts not actioned Valid account URLs reviewed and

deemed not to violate TikTok’s
Community Guidelines and/or local law.

Removal rate Rate at which TikTok removed or

restricted content or accounts in response
to government demands.

Date TikTok’s periodical reporting period

Total Government Requests The Tik Tok Foundation aggregates all

requests from governments over time.

Date range The period from January 1, 2022 to June

30, 2022 for test data (July 1, 2021 to
December 31, 2021, for training data)

Total copyright removal requests Request to Tik Tok about copyright issues
of countries.

Succesful copyright removal requests Requests approved by Tik Tok

Percentage of successful copyright Number of successful requests/total

requests
number of requests

Total trademark removal requests Requests to Tik Tok about trademark

issues between countries.

Succesful trademark removal Number of approved trademark removal

requests
requests

Percentage of successful trademark Number of Trademark Claims Approved /

requests
Total Claims

Reliability Indicates the level of confidence from

the requirements of the governments of
the countries (Reliable or unreliable).
9

3.2 Processing of data

Through the statistical analysis tool, the team cleaned the data as follows:

When looking at the data, the Date range is fixed because all data were reported
from January 1, 2022 to June 30, 2022. When using the group, the Date range was
removed from the studies and evaluated price.
10

3.3 Data visualization

The team used the Excel spreadsheet tool and the Python tool along with the
libraries to visualize the data and obtained the following results:

Number of requests over the years:

Over the years, the Tik Tok organization has received a large number of requests
from national governments. This requires the enormous amount of information that
this popular software has to deal with. As evidenced by the number of requests
skyrocketing year by year and showing no signs of stopping, giving the credibility of
these governments' claims is also a top priority.
11

Request removal for copyright reasons:

The failure rate of government requests is lower than the success rate,
demonstrating high reliability.
12

According to the criteria for removing the mark:

The number of requests made is higher than the number of failed requests, the
reliability from tik tok is high.
13

Rate between countries sending requests to Tik Tok:

The amount of requirements is also very different between the governments of the
countries and therefore it accounts for different proportions. The graph shows large
disparities across countries, and it does not govern reliability. This difference explains
the need to handle dirty information, excluding Tik Tok's reliability to this need.
14

Visual about the amount of content Tik Tok received:

The percentage of content removed for violating community standards is higher

than the percentage of content removed for violating the laws of the host country. The
percentage of content that is not removed is approximately 10% (low) showing the
level of trust in the content of countries for Tik Tok.
15

Number of accounts received:

The percentage of accounts that are resolved due to local legislation is lower than
accounts that are in violation of community guidelines. The rate of disapproved
accounts is still quite low, the reliability is relatively high.
16

Reliability based on deletion rate:

The country-to-country deletion rate is a reflection of Tik Tok's credibility with

government requests. The higher the deletion rate countries have, the more valid and
credible that government's claim is.
17

Heatmap obtained after concluding:

4. Performance results
4.1 Analysis of results based on software
In the first step in the training process, the students put the report data on the
request to remove the videos that violate the standards of the governments of the
countries from July to December of 2021 collected on Tiktok into the software. Orange
and declare properties for variables.

Figure 3: Declare attributes for variables in the training dataset

In which, the dependent variable “Reliability” is labeled into 2 types:

“Unreliable” and “Reliable” declared under the target attribute. The variable “Nation”
declared under the meta attribute does not affect the data classification process. The
18

remaining variables are independent variables whose properties are declared as

feature.

4 algorithms selected by the trainees for the training process include: Decision
Tree, Neural Network, SVM and Logistic Regression. And test the above 4 models to
overview the criteria and choose the most suitable model for the study according to the
following steps:

Figure 4 : Overview of the training process on the forecast

Here, the study uses the method of evaluating the classification model with Cross
Validation: K-fold with k = 5 to evaluate the model thanks to its outstanding features.
The model will be trained and predicted on many different pieces of data, not having
the same data when training between test sets to help the model increase its accuracy.
19

After training the data, the students obtained the following results:

Figure 5: Results after training data

Based on the CA, F1, Precision, Recall and AUC indexes, we can see that the
Decision Tree model has the best measured data among the models. In which the F1-
index is commonly used to evaluate the model. The model with the highest value is
0.977 or 97.7%. Although the AUC value of this method is not the highest (by 96.7%
lower than Neural Network 96.9%) , this is only a small part so it does not affect the
overall accuracy of the model.

In particular, the appropriateness of the Decision Tree algorithm for this study is
also proven through the confusion matrix evaluation method:

Figure 6: Results after training data

The above confusion matrix shows that in the 43 samples of the training dataset:

There are 15 samples belonging to the class of “Unreliable”, in which the number
of samples correctly classified up to 14 samples and 1 sample being misclassified.
20

In addition, the "Reliable" subclass has 28 samples, in which all samples belong
to the correct class and no sample is mistaken when classifying.

In conclusion, the Decision Tree model is very suitable for the dataset of this
study and is quite suitable for predicting the reliability analysis model of Tiktok in
requesting removal of videos that violate government standards. The countries in the
forecast dataset are presented in the following section.

4.2 Prediction data results

After deciding to choose the Decision Tree algorithm, students proceed to put the
forecast dataset into Orange software, then use the analysis learned from the training
data set to predict the reliability of the decision. request removal of videos that violate
national government standards from January to June 2022.

Figure 7 : Properties of the forecast dataset

Just like the training dataset, the dependent variable “Reliability” is declared as target.
The “Continents” variable is not important, so we will declare skip. And the variable
"Nation" and "Country code" declared under the meta attribute does not affect the data
classification process. In the forecast dataset are independent variables whose attribute
is feature.
21

We then feed the forecast data into Predictions to predict the reliability of tiktok
in requesting removal of videos that violate national government standards in 2022
using the following Decision Tree method:

Figure 8: Results of forecasting using Decision Tree

The results of tiktok's reliability forecast in requesting the removal of videos that
violate the standards of the governments of 54 countries from January to June 2022
show that:

There are 34 samples classified as “Reliable” and the remaining 20 samples are
predicted to be “Unreliable” in requesting removal of videos that violate national
government standards.

4.3 Evaluation of results and models

Based on the above 4 models that have been run, we find that the Decision Tree
model gives better results than the remaining 3 models, so it should be used to apply to
the data set to be predicted. Students believe that this model should be applied to the
reliability assessment of future applications.
23

5. Conclusions and General Comments

In general, the reliability of Tiktok in requesting the removal of videos that
violate the standards in countries is not really high and is absolutely accurate.
Standards for violating community standards are subject to the law. of each country,
each region and the strict control of the governments of the countries. For example, in
some developed countries, standards-violating video removal rates and relatively high
removal rates are implicitly confirmed. The work of ensuring network security in that
area is very good and vice versa.

Limitations of the topic:

* As a student, we do not have enough knowledge and experience to delve deeply

into the cybersecurity laws of each country to draw clear and absolute conclusions
about the security and accuracy of the network system. tiktok in removing videos that
violate standards in each country and region around the world.

* The survey scale is not really wide in only 43 countries and territories, so it is
not possible to cover all the factors that lead to the high reliability of tiktok.
24

6. Thanks
The team would like to express our sincere thanks to Dr. Thai Kim Phung-
Lecturer in Data Science Department for their enthusiastic support so that we can
successfully complete this statistical project. In the process of implementing the
project, if the team has any mistakes, we would like to receive your sincere
contribution so that the team can improve in the next projects.

7. References
Data source:

https://www.tiktok.com/transparency/en-us/government-removal-requests-2022-1/

Related knowledge such as: logistic regression ,decision tree, neutral work from
websites:

https://goeco.link/VNZtF

SME Credit Scoring Using Social Media Data
No ratings yet
SME Credit Scoring Using Social Media Data
82 pages
Ba Thesis
No ratings yet
Ba Thesis
66 pages
Effective Speculation That Provides Security in Social Network
No ratings yet
Effective Speculation That Provides Security in Social Network
6 pages
Case Stud1
No ratings yet
Case Stud1
5 pages
Sma Process
No ratings yet
Sma Process
14 pages
Final Report Data Mining
No ratings yet
Final Report Data Mining
17 pages
Reworked Solution
No ratings yet
Reworked Solution
9 pages
Unit 3 Social Computing
No ratings yet
Unit 3 Social Computing
19 pages
Cad - Phase 5
No ratings yet
Cad - Phase 5
24 pages
Neuralnetwork Baseddetectionoffraudulentprofilesinsocialmediaplatforms 250221160825 5f3d6744
No ratings yet
Neuralnetwork Baseddetectionoffraudulentprofilesinsocialmediaplatforms 250221160825 5f3d6744
70 pages
Hypothesis
No ratings yet
Hypothesis
9 pages
Data Mining For Managing and Using Onlin
No ratings yet
Data Mining For Managing and Using Onlin
8 pages
Data Mining in Social Media RBL LATEST
No ratings yet
Data Mining in Social Media RBL LATEST
11 pages
Introduction Data Science
No ratings yet
Introduction Data Science
29 pages
Clarkson - Joshua White - PHD Thesis Proposal - JSW - d4
No ratings yet
Clarkson - Joshua White - PHD Thesis Proposal - JSW - d4
45 pages
DATA 240 - 23 - Lec2 - FA 2024 - Dist
No ratings yet
DATA 240 - 23 - Lec2 - FA 2024 - Dist
45 pages
Tiktok Project: Exploratory Data Analysis: Background On The Tiktok Scenario
No ratings yet
Tiktok Project: Exploratory Data Analysis: Background On The Tiktok Scenario
22 pages
UGC List of Approved Journals
No ratings yet
UGC List of Approved Journals
9 pages
Mca II Sem Data Ware Hoise and Mining
No ratings yet
Mca II Sem Data Ware Hoise and Mining
53 pages
Vaibhav DSBDA Project
No ratings yet
Vaibhav DSBDA Project
16 pages
Chenchenyang Accessible
No ratings yet
Chenchenyang Accessible
46 pages
Module 1 ML Chapter2
No ratings yet
Module 1 ML Chapter2
56 pages
Researching TikTok Themes Methods and Future Directions
No ratings yet
Researching TikTok Themes Methods and Future Directions
14 pages
AIML Sem 8
No ratings yet
AIML Sem 8
82 pages
Fake Account Detection Using Machine Learning and Data Science
No ratings yet
Fake Account Detection Using Machine Learning and Data Science
58 pages
Social Computing
No ratings yet
Social Computing
35 pages
Mining in Social Media (Part 1) : Unit 3
No ratings yet
Mining in Social Media (Part 1) : Unit 3
15 pages
ETI11
100% (1)
ETI11
17 pages
Sma Exp 3
No ratings yet
Sma Exp 3
7 pages
Exploring Machine Learning Techniques Fo
No ratings yet
Exploring Machine Learning Techniques Fo
10 pages
SC Techneo
No ratings yet
SC Techneo
67 pages
Social Media Mining With R Sample Chapter
100% (1)
Social Media Mining With R Sample Chapter
18 pages
Big Data
No ratings yet
Big Data
77 pages
Social Computing (2019 Pattern, Semester VIII) - Exam Questions and Answers
No ratings yet
Social Computing (2019 Pattern, Semester VIII) - Exam Questions and Answers
25 pages
Social Media Analytics Process
No ratings yet
Social Media Analytics Process
14 pages
ZAI MSC 2015 20 Luo
No ratings yet
ZAI MSC 2015 20 Luo
73 pages
Machine Learning-Based Secure Data Acquisition For
No ratings yet
Machine Learning-Based Secure Data Acquisition For
10 pages
Data Mining in Social Network
No ratings yet
Data Mining in Social Network
12 pages
Spam Review Detection Using Linguistic Methods For Specified User in Twitter
No ratings yet
Spam Review Detection Using Linguistic Methods For Specified User in Twitter
11 pages
Twitter BDA Presentation
No ratings yet
Twitter BDA Presentation
15 pages
Machine Learning and Bigdata
No ratings yet
Machine Learning and Bigdata
27 pages
NCSPCN 12 CRP
No ratings yet
NCSPCN 12 CRP
3 pages
DWDM Unit Ii
No ratings yet
DWDM Unit Ii
24 pages
Miltsov Researching Tik Tokpreprint
No ratings yet
Miltsov Researching Tik Tokpreprint
17 pages
Cambridge University Press Social Media Mining An Introduction 2014
No ratings yet
Cambridge University Press Social Media Mining An Introduction 2014
338 pages
A Novel Visual Analytics Approach For Clustering Large-Scale Social Data
No ratings yet
A Novel Visual Analytics Approach For Clustering Large-Scale Social Data
8 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
43 pages
Fake Social Media Profile Detection
No ratings yet
Fake Social Media Profile Detection
10 pages
Book SN (Autosaved1) 2013
No ratings yet
Book SN (Autosaved1) 2013
313 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Social and Information Networks J-Component Report
No ratings yet
Social and Information Networks J-Component Report
28 pages
Fake Account Detection
No ratings yet
Fake Account Detection
33 pages
Chapter 10 - Data at Scale
No ratings yet
Chapter 10 - Data at Scale
29 pages
Sat - 25.Pdf - Discernment of Autonomous Profiles On Social Networking Services (SNS)
No ratings yet
Sat - 25.Pdf - Discernment of Autonomous Profiles On Social Networking Services (SNS)
11 pages
Final Examination: Ministry of Education and Training Vietnam National University International School
No ratings yet
Final Examination: Ministry of Education and Training Vietnam National University International School
13 pages
Thesis Revised
No ratings yet
Thesis Revised
64 pages
Kamlesh Mooc File
No ratings yet
Kamlesh Mooc File
15 pages
MAJOR PROJECT REPORT On Machine Learning Model To Determine Fake News
No ratings yet
MAJOR PROJECT REPORT On Machine Learning Model To Determine Fake News
52 pages
Speaking 1
No ratings yet
Speaking 1
2 pages
Fundamenta CT2
No ratings yet
Fundamenta CT2
2 pages
Compressor Power Formula - Step by Step Explanations
100% (1)
Compressor Power Formula - Step by Step Explanations
3 pages
CSP EXAM Equations Simply Explained and With Examples: December 2019
No ratings yet
CSP EXAM Equations Simply Explained and With Examples: December 2019
34 pages
Two-Stage Channel Design Procedures
No ratings yet
Two-Stage Channel Design Procedures
10 pages
Apostila de Formulas, Funções e Macros PDF
No ratings yet
Apostila de Formulas, Funções e Macros PDF
227 pages
MODULE 1. Introduction To Dynamics
100% (1)
MODULE 1. Introduction To Dynamics
12 pages
Bacon - Novum Organum
100% (1)
Bacon - Novum Organum
303 pages
Notre Dame of Trece Martires
No ratings yet
Notre Dame of Trece Martires
4 pages
Exp 2 - Tray Dryer PDF
No ratings yet
Exp 2 - Tray Dryer PDF
8 pages
CPC Provisional
No ratings yet
CPC Provisional
433 pages
Ait Previous Year Question Paper
No ratings yet
Ait Previous Year Question Paper
3 pages
Wepik The Aftermath Unveiling The Profound Effects of War 20231025171513HhUM
No ratings yet
Wepik The Aftermath Unveiling The Profound Effects of War 20231025171513HhUM
13 pages
Pasquale和Barra - 2024 - NTopology 4 - User Guide 2024 (Part 2) a Comprehensive Reference Manual for Beginners and Intermedi
No ratings yet
Pasquale和Barra - 2024 - NTopology 4 - User Guide 2024 (Part 2) a Comprehensive Reference Manual for Beginners and Intermedi
56 pages
MRUA
No ratings yet
MRUA
4 pages
17EC73-Complete Notes &VTU QP
No ratings yet
17EC73-Complete Notes &VTU QP
388 pages
JEE 2024-50 PYQs in 100 Mins
No ratings yet
JEE 2024-50 PYQs in 100 Mins
135 pages
Self-Learning Home Task (SLHT) : Describe The Impact of
No ratings yet
Self-Learning Home Task (SLHT) : Describe The Impact of
9 pages
A Guide To Olas PDF
No ratings yet
A Guide To Olas PDF
15 pages
The 5 Love Languages
No ratings yet
The 5 Love Languages
6 pages
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
No ratings yet
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
22 pages
Hottest Place On Earth: Colleen Kiber
No ratings yet
Hottest Place On Earth: Colleen Kiber
3 pages
Lab 09 - ACN - IP
No ratings yet
Lab 09 - ACN - IP
3 pages
Canned
No ratings yet
Canned
19 pages
Chapter 1 Managers, Profits and Markets
No ratings yet
Chapter 1 Managers, Profits and Markets
23 pages
Ipad Grant
No ratings yet
Ipad Grant
3 pages
Launch x431 Proton Coverage v14.40
No ratings yet
Launch x431 Proton Coverage v14.40
10 pages
Controlador Tracer 50A 60A Epever
No ratings yet
Controlador Tracer 50A 60A Epever
2 pages
Stress and Emphasis - Guinlist
No ratings yet
Stress and Emphasis - Guinlist
6 pages
A Computer Method For Nonlinear Inelastic Analysis of 3D Composite Steel-Concrete Frame Structures
No ratings yet
A Computer Method For Nonlinear Inelastic Analysis of 3D Composite Steel-Concrete Frame Structures
28 pages
Safe Operation of Reboilers/Condensers in Air Separation Units
No ratings yet
Safe Operation of Reboilers/Condensers in Air Separation Units
30 pages

Bao Cao Du An Ket Thuc Hoc Phan Pham Quynh Nga 31221024877

Uploaded by

Bao Cao Du An Ket Thuc Hoc Phan Pham Quynh Nga 31221024877

Uploaded by

MINISTRY OF EDUCATION AND TRAINING

UNIVERSITY OF ECONOMIC HO CHI MINH CITY

SUBJECT: DATA SCIENCE

Topic: Analyzing tiktok's reliability in requesting removal of videos

Instructor: Thai Kim Phung

Class of subject: 23D1INF50905929

Ho Chi Minh City, March 20, 2023

analysis and processing methods becomes easier to understand thanks to the

1.2 Research goals

Methods of data processing and classification research (classification methods

Analyze the credibility of national governments' handling of community

1.3 Methods of implementation

1.4 Research object

Data mining process:

2.2 Machine learning

Instead of programming explicit rules to solve a problem, machine learning

Some algorithms in Machine Learning:

Linear Regression: A supervised learning method for continuous value prediction

Logistic Regression: A supervised learning method for the binary classification

Decision Tree: A supervised learning method for a classification or regression

Random Forest: A supervised learning method for a classification or regression

Support Vector Machine (SVM): A supervised learning method for a classification

Clustering: An unsupervised learning method for a clustering problem based on

Principal Component Analysis (PCA): An unsupervised learning method for data

Neural Networks (Mạng neural): A supervised or unsupervised learning method

Figure 1. Machine Learning classification image

2.3 Data visualization

Figure 2. Examples of some types of charts used to visualize data

3. Proposed research model

Other variables include:

Country Names of countries that submitted reports

Total requests received Government requests to remove or restrict

Total content received Valid content URLs requested, excluding

Content actioned due to Community Valid content URLs reviewed and

Content not actioned Valid content URLs reviewed and deemed

Total accounts received Valid account URLs and any other

Accounts actioned due to Community Valid account URLs reviewed and

Accounts not actioned Valid account URLs reviewed and

Removal rate Rate at which TikTok removed or

Date TikTok’s periodical reporting period

Total Government Requests The Tik Tok Foundation aggregates all

requests from governments over time.

Date range The period from January 1, 2022 to June

Succesful copyright removal requests Requests approved by Tik Tok

Percentage of successful copyright Number of successful requests/total

Total trademark removal requests Requests to Tik Tok about trademark

Succesful trademark removal Number of approved trademark removal

Percentage of successful trademark Number of Trademark Claims Approved /

Reliability Indicates the level of confidence from

3.2 Processing of data

3.3 Data visualization

Number of requests over the years:

Request removal for copyright reasons:

According to the criteria for removing the mark:

Rate between countries sending requests to Tik Tok:

Visual about the amount of content Tik Tok received:

The percentage of content removed for violating community standards is higher

Number of accounts received:

Reliability based on deletion rate:

The country-to-country deletion rate is a reflection of Tik Tok's credibility with

Heatmap obtained after concluding:

Figure 3: Declare attributes for variables in the training dataset

In which, the dependent variable “Reliability” is labeled into 2 types:

remaining variables are independent variables whose properties are declared as

Figure 4 : Overview of the training process on the forecast

Figure 5: Results after training data

Figure 6: Results after training data

4.2 Prediction data results

Figure 7 : Properties of the forecast dataset

Figure 8: Results of forecasting using Decision Tree

4.3 Evaluation of results and models

5. Conclusions and General Comments

Limitations of the topic:

* As a student, we do not have enough knowledge and experience to delve deeply

You might also like