0% found this document useful (0 votes)
71 views

Big Data Analysis

DSNVKJDNB

Uploaded by

tamanna sharma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Big Data Analysis

DSNVKJDNB

Uploaded by

tamanna sharma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

BIG DATA ANALYTICS

Table of Contents
Introduction................................................................................................................................3

Aims and objectives...................................................................................................................3

Critical understanding of big data management.........................................................................3

Big data manipulation................................................................................................................4

Modeling methods......................................................................................................................4

Tools and techniques..................................................................................................................5

Section 1: Big Data Analytics (python).....................................................................................5

Task 1: Problem Domain, Data Description, and Research Question....................................6

1.1 Problem domain............................................................................................................6

1.2 Data Description...........................................................................................................6

1.3 Research Question........................................................................................................6

1.4 Statistical methods........................................................................................................7

Task 2: Solution Exploration.....................................................................................................8

2.1 Approaches and technologies for developing big data application..................................8

2.2 Discussion of chosen methodological approach with justification..................................8

2.3 Solutions and techniques for related problems.................................................................9

Task 3: Solution Development.................................................................................................10

3.1 Data pre-processing........................................................................................................10

3.2 Descriptive Statistics......................................................................................................14

3.3 Data Visualization..........................................................................................................15

3.4 Statistical significance....................................................................................................16

3.5 Solution of Research Questions......................................................................................16

Task 4: Evaluation and Future Development...........................................................................19

4.1 Conclusion......................................................................................................................19

4.2 Evaluation.......................................................................................................................19

4.3 Limitation.......................................................................................................................19

2
4.4 Future Work....................................................................................................................20

Section 2 – Business Intelligence (Tableau)............................................................................21

Introduction..........................................................................................................................21

Dataset Description...............................................................................................................21

Task 1...................................................................................................................................21

Task 2...................................................................................................................................22

Task 3...................................................................................................................................23

Task 4...................................................................................................................................23

Task 5...................................................................................................................................25

Conclusion............................................................................................................................26

Reference List..........................................................................................................................27

3
Introduction
“Big data analysis” is the process of extracting valuable insights and information from large
and complex datasets. Massive amounts of structured and unstructured data are referred to as
"big data," and they are produced by a variety of sources, including social media, IoT
devices, sensors, and more. As data generation across numerous industries has increased
exponentially, big data analysis has attracted a lot of attention in recent years. Analyzing big
data can provide organizations with valuable insights that can help improve decision-making,
identify new opportunities, and optimize business processes. Big data analysis involves
several steps, including data collection, cleaning, processing, and analysis. Advanced
technologies such as “machine learning”, “artificial intelligence”, and “natural language
processing” are often used to make sense of massive amounts of data some popular
applications of big data analysis include fraud detection, risk management, customer behavior
analysis, predictive maintenance, supply chain optimization, and more. Numerous industries,
including healthcare, finance, retail, and manufacturing, stand to benefit from big data
analysis. Big data analysis is a quickly expanding field that offers businesses useful
information and insights that can aid in decision-making and give them a competitive edge.
The researcher use two datasets such as “Billionaires” and “Electronic sales” to perform big
data analysis. The research study includes two different sections to perform big data analysis,
data preprocessing, and data visualization with the help of python and tableau platforms.
Aims and objectives
The aim of this project is to use the Tableau Platform and Python programming languages to
perform big data analytics on a precise dataset.
The objectives are:
● To perform big data visualization for answering all the research questions in python
● To use different parameter functions to display a chart based on the corresponding
table on the tableau platform
Critical understanding of big data management
Due to its simplicity of use and abundance of libraries and tools, Python is a well-liked
programming language for managing large amounts of data. Python's large data management
demands a thorough understanding of a number of crucial components. One of the most
critical considerations is data handling. As per the view Kharel of et al. (2020), libraries such
as Pandas, NumPy, and Desk are used to read, manipulate, and transform data in various
formats. Scalability is another important issue, and solutions for distributed computing can be
utilized to process big datasets over many nodes. Data cleansing and preparation are equally

4
key phases in big data analysis. Scikit-learn is one of the libraries offered by Python for data
preprocessing tasks like resolving missing values, eliminating duplicates, and feature scaling.
In order to find patterns and make predictions, big data analysis usually uses machine
learning algorithms. Python includes numerous machine learning frameworks such as
TensorFlow and Scikit-learn that allow developers to design and deploy machine learning
models easily ultimately, big data analysis depends on visualization. Python has a number of
libraries, such as Matplotlib and Seaborn that can be used to produce high-quality
visualizations that can assist users in drawing conclusions from sizable datasets. In
conclusion, big data management in Python is a challenging process that calls for a variety of
tools, strategies, and expertise to efficiently handle and analyze enormous datasets.
Big data manipulation
In order to work with and analyze massive datasets, “big data manipulation” in Python uses
a range of tools and packages. As per the view of Bhatia et al. (2020), data management is
one of the most crucial components of huge data manipulation in Python. Pandas, NumPy,
and Dask are just a few of the libraries that Python provides that let users read, analyze, and
modify data in a number of formats. These libraries offer strong resources for handling large
datasets and can be used to carry out a number of tasks, including merging datasets, grouping
data, and pivoting tables. Data must first be cleansed of big datasets over many nodes. Data
cleansing and preparation are equally key phases in big data analysis. And made ready for
analysis before analysis can start. For data preprocessing, including handling missing values,
eliminating duplicates, and feature scaling, Python provides a number of packages, such as
Pandas. Data filtering is a crucial component of large data manipulation because it enables
users to extract subsets of data in accordance with predetermined standards. Pandas and
NumPy are just two of the packages available in Python for data filtering. Large data
manipulation comprises the conversion of data between different formats. Python has a
number of libraries for data manipulation, including Pandas and NumPy for producing top-
notch visuals that can aid users in deriving insights from huge datasets, such as Matplotlib
and Seaborn. In summary, processing huge data in Python requires a combination of tools
and handling strategies.
Modeling methods
Many modeling strategies and procedures are used while performing large data analysis in
Python with Tableau. As per the view of Peng et al. (2021), ML methods are frequently used
in Python to find patterns and forecast data from massive datasets. TensorFlow and Scikit-
learn are only two of the packages available in Python for creating and deploying machine

5
learning models. These libraries offer a range of methods, such as deep learning, clustering,
regression, and classification. Data is one of the most crucial components of large data
analysis in Python. Data must first be cleansed and made ready for analysis before analysis
can start. For data preprocessing, including handling missing values, eliminating duplicates,
and feature scaling, Python provides a number of packages, such as Pandas. Data can be used
to train machine learning models once it has undergone preprocessing. Data modeling in
Tableau is frequently done through the drag-and-drop interface. Users may quickly develop
models and visualizations because of this without having to write any code. The modeling
methods available in Tableau include grouping, regression, and forecasting. Large datasets
can be analyzed using these methods to spot patterns and trends. Also crucial to large data
research in Python is data visualization. In order to execute large data analyses in Python and
Tableau requires combining modeling approaches and techniques, such as machine learning
algorithms, data preprocessing, and data visualization. Using these methods and tools, one
can examine enormous datasets as well as glean insightful data.
Tools and techniques
It takes a combination of tools and strategies to create big data analytics with Python and
Tableau. As per the view of Joe et al. (2021), data manipulation, statistical analysis, ML, data
visualization, data mining, and business intelligence tools and methodologies are all
combined when creating big data analytics in Python and Tableau. Users can acquire
important insights from huge datasets and make data-driven decisions by utilizing these tools
and strategies. Some of the most popular tools and methods for creating big data analytics in
Python and Tableau are listed below:
Python libraries: Pandas, NumPy, and Scikit-learn are just a few of the large data analytics
libraries available in Python. These libraries offer strong capabilities for machine learning,
statistical analysis, and data manipulation.
ML Algorithms: To find patterns and generate predictions from enormous datasets, machine
learning algorithms are frequently employed in big data analytics. TensorFlow and PyTorch
are only two of the libraries available in Python for creating and deploying machine learning
models.
Data preprocessing: Data needs to be cleaned up and made ready for analysis before analysis
can start. For data preprocessing, including handling missing values, eliminating duplicates,
and feature scaling, Python provides a number of packages, such as Pandas.
Data Visualization: Users may explore and comprehend data in a number of ways thanks to
data visualization which is essential in big data analytics.

6
Section 1: Big Data Analytics (python)
Task 1: Problem Domain, Data Description, and Research Question
1.1 Problem domain
This section is responsible for outlining the many types of Python-based big data analytics
difficulties. A challenging area for big data analytics with Python is working with enormous
datasets that cannot be processed and analyzed using normal data analysis methods. As the
digital age has progressed, massive amounts of data are being generated every day from a
variety of sources, including social media, IoT devices, sensors, and more. Companies can
use the trends and insights shown by this data to enhance their processes and gain a
competitive edge.
1.2 Data Description

Figure 1.2.1: Dataset


(Source: Kaggle)
The image shows the “billionaires” dataset that has 2615 rows and 21 columns. The columns
include “name”, “rank”, “year”, “company. Founded”, “company. Names”, “company.
Relationship”, “company. Type”, “location. Region” and so on. The dataset holds all the
information about the company. Once the dataset has been imported into the Jupiter platform,
the researcher completes all the required tasks of this research study.
1.3 Research Question

7
There are four research questions that arise in this particular section. The researcher has to
answer all the questions with the help of python language in the jupyter platform. The
research questions are discussed below:
● What are the top 10 countries with the highest number of billionaires?
● What industries/sectors are most successful?
● What are the main industries with the highest number of women billionaires?
● What age range represents the highest and lowest number of billionaires?
1.4 Statistical methods
A large variety of statistical techniques are available in Python for data analysis. These
techniques can be used to model relationships between variables, test hypotheses, and
evaluate and summaries data. Among the often employed statistical techniques in Python are:
Testing Hypothesis: Python includes tools for doing hypothesis tests, including the t-test and
ANOVA. As per the view of Wang et al. (2022), these tests can assist in determining the
statistical significance of a difference between groups.
Regression analysis: Modeling the relationship between variables using regression analysis
is a powerful statistical technique. As per the view of Ghivary et al. (2023), regression
analysis is supported by a number of Python modules, including stats models and sci-kit-
learn.
Time series analysis: It is a statistical technique used to examine time series data. Pandas and
Statsmodels are only two of the time series analysis libraries that Python offers.
Clustering: It is a statistical technique for combining related data elements. As per the view
of Guerrero et al. (2020), many clustering methods, including K-means and hierarchical
clustering, are available in Python.

8
Task 2: Solution Exploration
2.1 Approaches and technologies for developing big data application
Python provides a wide range of tools and techniques for building big data analytics
applications. As per the view of Musazade et al. (2022), developers may quickly process,
analyze, and extract insights from massive amounts of data, resulting in more informed
business decisions, by integrating these technologies and methodologies. A few of the
powerful Python modules that can be used to efficiently handle and manage massive amounts
of structured and unstructured data include Pandas, NumPy, Dask, and Apache Spark. Scikit-
learn, TensorFlow, and Keras are just a handful of the numerous ML libraries that are
available in Python that may be used to build prediction models and derive conclusions from
enormous volumes of data. Prediction charts can grow far more complex in models that are
more complicated, such as those employed in time-series analysis or machine learning. A
prediction plot for a neural network, for instance, can show the actual observed values
alongside the expected values of the model while also including extra details like error bars
or training/validation set performance data. Data visualization is a crucial tool for data
analysis since it makes it possible to convey complex information in an understandable and
direct way. The researcher can make a variety of visualizations using Python packages for
data visualization to help The researcher comprehend the data and share insights with others.
Python includes a number of visualization tools that can be used to create interesting and
instructive huge data visualizations, such as Matplotlib, Seaborn, Plotly, and Bokeh.
2.2 Discussion of chosen methodological approach with justification
Using Python programming languages, the researcher performed the preprocessing and
visualization for this work. Visualization is the process of representing data graphically or
visually in order to clearly convey concepts or information. Visualization is an essential
component of big data analytics because it makes complex data patterns, relationships, and
trends easier to understand for users. In big data analytics, visualization techniques like
scatter plots, line charts, histograms, box plots, heat maps, and geographical maps are
frequently employed. Anomalies and outliers can be found via visualization, as can trends,
different datasets can be compared, and results can be presented. Using machine learning
algorithms or other statistical techniques, preprocessing is the cleaning and translation of raw
data into a format that can be easily studied. In big data.
Data Cleaning

9
This entails addressing outliers, completing any data gaps, and correcting any inaccuracies.
Data cleaning is a process that looks for and corrects errors or discrepancies in the data to
improve its quality and correctness.
Data Transformation
Data transformation is the process of converting data from one format to another in order to
get it ready for analysis or modeling. A dataset must be transformed into one that is suitable
for the desired modeling or analytic task.
2.3 Solutions and techniques for related problems
The issues of big data analysis include those related to data processing, analysis, and
visualization. Fortunately, there are a variety of approaches and methods available to get over
these obstacles and boost the effectiveness and efficiency of big data analysis.
Data storage is one of the main difficulties in big data analysis. As a result, big data sets may
necessitate specialized storage solutions. These systems offer scalable and distributed storage
options that make it possible to process big data sets effectively.

10
Task 3: Solution Development
3.1 Data pre-processing
As per the view of Musazade et al. (2022), python data analysis A well-liked method for
handling and analyzing data is Jupyter. Jupyter is an interactive environment that enables
users to create and share documents that contain live code, graphics, and narrative text.
Python is a flexible programming language that offers a wide variety of tools and frameworks
for data analysis. Data analysts can carry out a variety of data processing and analysis tasks
using Python Jupyter notebooks. Data input and export, data cleaning, data transformation,
and data visualization are supported by notebooks. Jupyter notebooks streamline the data
analysis process by combining the code, visuals, and narrative text into a single document.

3.1.1 Load Data


(Source: Created in Jupyter)
Here, the researcher loads them into the dataframe as “data”.

11
3.1.2 Data preprocessing
(Source: Created in Jupyter)
The researcher performs data preprocessing by calculating the shape of the dataset.

12
3.1.3 Data preprocessing
(Source: Created in Jupyter)
The researcher performs data preprocessing by removing and dropping null values from the
imported dataset.

13
3.1.4 Data preprocessing
(Source: Created in Jupyter)
The researcher performs data preprocessing by finding correlations between all variables and
dropping unnecessary values from the imported dataset.

14
3.1.5 Data preprocessing
(Source: Created in Jupyter)
The researcher performs data preprocessing by statistical analysis.

15
3.1.6 Data preprocessing
(Source: Created in Jupyter)
The researcher performs data preprocessing by finding unique values present in the imported
dataset.
3.2 Descriptive Statistics
A collection of Python techniques known as descriptive statistics is used to enumerate and
describe a data set's fundamental characteristics. NumPy, Pandas, and SciPy are just a few of
the descriptive statistics modules that Python offers. One of the most commonly used
descriptive statistics in Python is the mean, which represents the average value of a data set.
The mean can be calculated using NumPy's "mean" function or Pandas' "mean" method.
Another frequently used statistic is the standard deviation, which measures the spread of the
data around the mean. As per the view of Guerrero et al. (2020), the standard deviation can
be calculated using NumPy's "std" function or Pandas' "std" method. Here, the researcher
performs a descriptive analysis process in an efficient manner.
3.3 Data Visualization
Data visualization is the process of finding correlations as well as trends between different
types of data that are present in the imported dataset. Various types of library functions are
used for visualizing data. By using different types of visual elements such as graphs, charts,
and maps. As per the view Kharel of et al. (2020), it is mainly used as a graphical
representation of data and information. The ability to better understand and share insights

16
from the data is made possible by data visualization, which is a crucial step in the data
analysis process. The researcher can spot patterns, trends, and relationships in the data by
producing visualizations that might not be visible from the data's raw statistics or tables.

3.3.1 Data Visualization


(Source: Created in Jupyter)
The researcher performs data visualization by plotting the box plot to represent the column
“demographies. age”.

3.3.2 Data Visualization


(Source: Created in Jupyter)
The researcher performs data visualization by plotting the line plot to represent the column
“location. gdp”.

17
3.3.3 Data Visualization
(Source: Created in Jupyter)
The researcher performs data visualization by plotting the pair plot to represent all the
column of the dataset.

3.3.4 Data Visualization


(Source: Created in Jupyter)
The researcher performs data visualization by plotting the pie chart to represent the column
“year”.

3.3.5 Data Visualization


18
(Source: Created in Jupyter)
The researcher performs data visualization by plotting the bar plot to represent the column
“location. gdp”.

3.3.6 Data Visualization


(Source: Created in Jupyter)
The researcher performs data visualization by plotting the histogram plot to represent the
column “demographics.age”.

3.3.7 Data Visualization


(Source: Created in Jupyter)
The researcher performs data visualization by plotting the line plot to represent the column
“demographics.age”.

19
3.3.8 Data Visualization
(Source: Created in Jupyter)
The researcher performs data visualization by plotting the box plot to represent the column
“wealth. World in billions”.

3.4 Statistical significance


3.5 Solution of Research Questions

3.5.1 Question 1
(Source: Created in Jupyter)
Here, the researcher performs data visualization by plotting the bar plot to calculate the top
10 countries with the highest number of billionaires.

20
3.5.2 Question 2
(Source: Created in Jupyter)
Here, the researcher performs data visualization by plotting the bar plot to find out the most
successful sectors or industries.

3.5.3 Question 3
(Source: Created in Jupyter)
Here, the researcher performs data visualization by plotting the pie chart to find out the main
industries with the highest number of woman billionaires.

21
3.5.4 Question 4
(Source: Created in Jupyter)
Here, the researcher performs data visualization by plotting the histogram plot to find out the
range of the highest and lowest number of billionaires.

22
Task 4: Evaluation and Future Development
4.1 Conclusion
The researcher was able to perform preprocessing and a data type check after reviewing the
entirety of section one's work. The researcher in this project doing big data analytics using
python and the other part is totally based on the prediction, and calculation using various
parameter functions in the Tableau platform. Python programming, along with related tools,
is used in big data analytics to examine massive, intricate data sets. Large amounts of data
must be processed, stored, and analyzed in order to uncover insightful trends and patterns that
can guide company strategy and decisions. The ability to create perceptive visualizations to
better understand the data is a critical part of big data analysis and Python provides a range of
visualization libraries like Matplotlib and Seaborn. These libraries offer a range of visuals,
from simple line charts and scatter plots to more complex heat maps and 3D visualizations. In
addition to providing modules for data manipulation and visualization, Python also provides
modules for distributed computing, which is crucial for processing large volumes of data also
obtained numerous visualization plots after thoroughly examining them. Big data analytics
built on Python offers a powerful as well as flexible toolkit for managing vast and complex
datasets. It is a popular choice for big data analytics due to its popularity as a programming
language and the availability of strong data analysis and machine learning modules such as
NumPy, Pandas, as well as Scikit-learn. Big data analytics built on the Python language
offers a variety of chances for data-driven insights and decision-making in a variety of fields
and applications.
4.2 Evaluation
This section is in charge of describing how to evaluate big data analytics using Python. As
part of the evaluation of big data analytics, the efficacy and usefulness of data-driven insights
and decision-making obtained by analyzing enormous datasets using a variety of analytical
approaches and tools are evaluated. The process of turning data into insightful knowledge
that can be applied to business choices is known as business intelligence. With the help of
Tableau, users can build interactive dashboards and visualizations to analyze data. The
quality of the analytical models and algorithms used in big data analytics determines how
accurate predictions and judgments may be made. The evaluation should rate the model's
predicted outcomes in addition to identifying trends and patterns.
4.3 Limitation
Big data analysis calls for expertise in distributed computing and parallel processing in
addition to programming skills in Python. Although Python includes distributed computing

23
frameworks like Apache Spark, using them successfully necessitates a certain amount of
experience and comprehension of the underlying concepts. Overall, Python is a flexible
language for big data analysis, but it's vital to take into account its constraints and difficulties
when working with extraordinarily massive datasets. Due to the enormous volumes of data
generated by many sources, including social media, sensors, and other digital platforms, big
data analysis has become more and more popular in recent years. Yet, there are several
restrictions related to big data analysis that should be taken into account. The issue of data
quality is one of the main restrictions. Big data analysis makes the premise that the data it
uses are correct and trustworthy, but in practice, the data may be lacking, inconsistent, or
inaccurate.
4.4 Future Work
Big data analytics depends on the freshness of data for performing decision making, real-time
analyses. Big data analytics has been developing quickly in recent years, and its prospects are
bright. Big data analytics has the potential to revolutionize a wide range of industries and
sectors due to the ongoing development of enormous and diversified data sets. The constantly
evolving and bettering Python libraries like NumPy, Pandas, and Scikit-learn makes it easier
for analysts and data scientists to work with massive data. There are already several new
libraries emerging that offer distinct features for big data analytics. Big data analytics are in
high demand as more and more businesses see the benefits of data-driven decision-making.
Python is a great option for big data analytics projects because of how well-liked it is as a
programming language and how well it can handle massive data.

24
Section 2 – Business Intelligence (Tableau)
Introduction
Tableau is an effective tool for data visualization and big data analysis. In order to analyze
massive amounts of data and get insightful knowledge, one can connect tableau to a number
of data sources. Here, the researcher performs the role of a data analyst for answering all the
business questions with the help of data visualization in an efficient manner. The researcher
used a dataset for data visualization by using the tableau software platform.
Dataset Description

Figure 1: Chosen dataset


(Source: Kaggle)
The graphic displays the "Electronic Sales" dataset, which has 12037 rows and 6 columns.
Product name, order id, price, order date, quantity ordered, and purchase address are all listed
in the columns. The dataset holds all the sales information through the columns. Once the
dataset has been imported into the tableau platform, the researcher completes all the required
tasks of this research study.
Task 1

25
Figure 2: Top 10 highest-selling products
(Source: Tableau)
Here the researcher visualized the top 10 highest-selling products of Walmart.

Figure 3: Top 10 lowest-selling products


(Source: Tableau)
Here the researcher visualized the top 10 lowest-selling products of Walmart.

Task 2

26
Figure 4: Sum and Average sales per city
(Source: Tableau)
Here, the researchers perform the visualization to display the sum and average sales per city.
Task 3

Figure 5: Weekly sales


(Source: Tableau)
Here, the researchers perform the visualization to display the weekly sales per city.
Task 4

27
Figure 6: Month of Order Date
(Source: Tableau)
Here, the researcher calculates the month of order date to represent a 6-month warranty from
the date of purchase.

28
Figure 7: 6month data function
(Source: Tableau)
Here, the researchers perform the visualization to display the 6-month warranty from the date
of purchase.
Task 5

Figure 8: Dashboard
(Source: Tableau)

29
Here, the researcher creates a dashboard that represents all the task sheets.
Conclusion
Tableau is used for data analysis, data preprocessing, collaboration, as well as share big data
insights. The researcher used data functions, and parameters in order to create various plots,
charts, tables, and maps. Here the researcher answered all the questions through data
visualization and create an interactive dashboard to display all the task sheets.

30
Reference List
Al Ghivary, R., Mawar, M., Wulandari, N. and Srikandi, N., 2023. PERAN VISUALISASI
DATA UNTUK MENUNJANG ANALISA DATA KEPENDUDUKAN DI INDONESIA.
PENTAHELIX, 1(1), pp.57-62.
Bhatia, K., Chhabra, B. and Kumar, M., 2020, November. Data analysis of various terrorism
activities using big data approaches on global terrorism database. In 2020 Sixth International
Conference on Parallel, Distributed and Grid Computing (PDGC) (pp. 137-140). IEEE.
Guerrero-Prado, J.S., Alfonso-Morales, W., Caicedo-Bravo, E., Zayas-Pérez, B. and
Espinosa-Reza, A., 2020. The power of big data and data analytics for AMI data: A case
study. Sensors, 20(11), p.3289.
Ide, N., Serout, A., Rankel, T. and Dengler, T., 2020. Leveraging big data analysis to enhance
the validation of EGT-Systems. In 20. Internationales Stuttgarter Symposium: Automobil-
und Motorentechnik (pp. 297-313). Springer Fachmedien Wiesbaden.
Joe, V., Raj, J.S. and Smys, S., 2021. Towards Efficient Big Data Storage With MapReduce
Deduplication System. International Journal of Information Technology and Web
Engineering (IJITWE), 16(2), pp.45-57.
Kharel, T.P., Ashworth, A.J., Owens, P.R. and Buser, M., 2020. Spatially and temporally
disparate data in systems agriculture: Issues and prospective solutions. Agronomy Journal,
112(5), pp.4498-4510.
Musazade, N., 2022. Understanding the relevant skills for data analytics-related positions: An
empirical study of job advertisements.
Peng, J., Wu, W., Lockhart, B., Bian, S., Yan, J.N., Xu, L., Chi, Z., Rzeszotarski, J.M. and
Wang, J., 2021, June. Dataprep. eda: task-centric exploratory data analysis for statistical
modeling in python. In Proceedings of the 2021 International Conference on Management of
Data (pp. 2271-2280).
Sousa, B.C., Valente, R., Krueger, A., Schmid, E., Cote, D.L. and Neamtu, R., 2022,
February. Investigating the Suitability of Tableau Dashboards and Decision Trees for
Particulate Materials Science and Engineering Data Analysis. In TMS 2022 151st Annual
Meeting & Exhibition Supplemental Proceedings (pp. 691-701). Cham: Springer
International Publishing.
Wang, Y., 2022, October. Construction and Application of Precision Marketing System of E-
commerce Platform under the Background of Big Data. In Proceedings of the International
Conference on Information Economy, Data Modeling and Cloud Computing, ICIDC 2022,
17-19 June 2022, Qingdao, China.

31
32
33

You might also like