0% found this document useful (0 votes)
24 views

Data Analysis and Visualization of Sales Data

Uploaded by

Jojo Lannister
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Data Analysis and Visualization of Sales Data

Uploaded by

Jojo Lannister
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)

Data Analysis and Visualization of Sales Data

Kiran Singh Rakhi Wajgi


Department of Computer Technology Department of Computer Technology
YCCE YCCE
Nagpur, India Nagpur, India
[email protected] [email protected]

Abstract— Data is being generated very rapidly due to increase complex database with more unanticipated variations than
in information in everyday life. Huge amount of data get normal ones, even the domain master would find it difficult to
accumulated from various organizations that is difficult to reach useful results. In order to express better visualization of
analyze and exploit. Data created by an expanding number of results, analysis of data is needed.
sensors in the environment such as traffic cameras and satellites,
internet activities on social networking sites, healthcare database,
One of the key steps in Business Intelligence process where
government database, sales data etc., are example of huge data. data is extracted and correlated from various data sources. In
Processing, analyzing and communicating this data are a today’s globalized market most organizations have multiple
challenge. Online shopping websites get flooded with voluminous information repositories. Human Resources, Sales, Customer
amount of sales data every day. Analyzing and visualizing this Management and Marketing all have information systems for
data for information retrieval is a difficult task. Therefore a their needs. Often each of these departments has multiple
system is required which will effectively analyze and visualize databases and applications and with the adoption of SAAS
data. This paper focuses on a system which will visualize sales recently, more and more data is kept in different cloud
data which will help users in applying intelligence in business,
offerings along with some databases in premise.
revenue generation, and decision making, managing business
operation and tracking progress of tasks. Furthermore, in the real world, three other important topics
must be faced by the decision makers, which are as follows:
Index Terms— Sales data, Analysis, Visualization, Report
generation. 1) Flexibility and versatility of the visualization procedure;
2) Transparency to get at supporting evidence; and
3) The processing cost and computation speed.

I. INTRODUCTION This paper is organized into different sections as follows:-


Data visualization is a process which aims to communicate Section 2 briefly describes visualization toolkits along with the
data effectively and clearly to the user through graphical techniques/methods used, section 3 discuss about the related
representation. Effective and efficient data visualization is the work done by the different authors, section 4 contains the brief
key part of the discovery process. It is the intermediate description of the proposed methodology, finally we draw
between the human intuition and quantitative context of the some conclusion in section 5.
data, thus an essential component of the scientific path from II. VISUALIZATION TOOLKITS
data into knowledge and understanding. It is a powerful new
technology having a great potential to help researchers as well In the following section we will discuss the visualization
as companies for building revenue decision [1]. toolkits used, techniques and methods present and drawbacks
Extracting relevant information and useful knowledge from of it.
large mixed-mode data spaces is complex by various A. Visualization Techniques
challenging mark such as the limitations of data storage
• Pixel-Oriented Visualization Techniques: Using pixel
formats, a deficit of expert prior knowledge for real-world
is an easy way to visualize the value of the data which
databases, the difficulty of visualizing the data using
depends on the dimensions in which the value of
inefficient data mining tools, etc. Data mining is a series of
dimension represents the color of the pixel. Given a
steps in the knowledge discovery process, consisting of the
data set of n dimensions, pixel-oriented techniques
use of particular algorithms for generating pattern, as required
create n windows on the screen, one for each
by the real world.
dimension. The n dimension values of a record are
Huge amount of data becomes important not for its quantity
mapped to n pixels at the corresponding positions in
but for the quality of information extracted from it. For a
the windows. The corresponding values are reflected
relatively complex real problem with a large data space, all
by the colors of the pixels. The data values are
knowledge generating and data mining tools would become
arranged inside a window, in a global order which is
obviously inefficient, even unassisted sometimes. For a larger
shared by all windows [1].

978-1-4673-9214-3/16/$31.00 © 2016 IEEE


Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.
2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)
• Geometric Projection Visualization Techniques: The B. Visualization Methods
pixel-oriented visualization techniques fail to help us The Data presentation can be beautiful, elegant and
in understanding the distribution of data in a descriptive. For visualizing data there are variety of
multidimensional collocation. For example, compact conventional ways such as bar graphs, joint graphs, histogram,
domain in multidimensional collocation cannot be tables and pie charts are being used every day, in every project
shown by pixel oriented visualization techniques. and on every possible occasion.
Geometric projection techniques help users in • TreeMap: Space-filling visualization of hierarchical
discovering zestful hurling of multidimensional data data is carried out using this method. There is a strict
sets. Visualizing high dimensional space on a 2-D
requirement that data objects have to be
display is the main challenge that the geometric
projection techniques try to address. Using Cartesian hierarchically linked. Treemap is characterized as a
coordinates scatter plot displays 2-D source points. root rectangle splinted into regions, also depicted by
Using different colors or shapes a third dimension can the small rectangles, which correspond to data
be added to represent different data points [1]. objects from a set [2]. Example of this method is free
• Icon-Based Visualization Techniques: In Icon-based space on hard drive visualization. This Method can
visualization techniques multidimensional data values be applied to large volume data, repeatedly
are represented by using small icons. Two popular
representing data layers for each level of hierarchy.
icon-based techniques are Stick Figures and Chernoff
faces. Chernoff faces were introduced by statistician • Circle Packing: Direct alternative to Treemap is
Herman Chernoff in 1973. As a cartoon human face Circle packing, as it uses circles as its primitive
they display multidimensional data of up to 18 shape, where circles from a higher hierarchy level are
variables (or dimensions). Chernoff faces help reveal also included in it. As the circle packing method has
trends in the data. Values of the dimensions are the same properties as Treemap has therefore it is
represented by the shapes, size, placement, and
based on Treemap method. So, we can say that only
orientation of the eyes, ears, mouth, and nose, which
are the components of the face. For example, large volume data condition can be met by this
dimensions can be mapped to the following facial method.
characteristics: head eccentricity, nose length, eye • Sunburst: Another alternative to Treemap is
eccentricity, eye size, mouth curvature, eye spacing, Sunburst, which uses Treemap visualization,
nose width, mouth width, mouth openness, pupil size modified to polar coordinate system. The variable
and eyebrow slant.
parameters are radius and arc length instead of width
• Hierarchical Visualization Techniques: The
and height which is the main difference between
techniques for visualization discussed so far focus on
these methods. Because of this difference there is no
visualizing various dimensions cumulatively. In spite
need to repaint the whole diagram when data
of it would be stiff to visualize all dimensions at the
changes, but only some portion containing new data
analogical time for a large data set of high
by modifying its radius. And due to this property, this
dimensionality. In Hierarchical visualization
method can be adapted to show data dynamics, using
techniques all dimensions are partitioned into subsets
animation.
(i.e., subspaces). In hierarchical manner the
Circular Network Diagram: Based on the rate of their
subspaces are visualized.
creativeness data object are placed around a circle and linked
• Visualizing Complex Data and Relations: In early by curves. Different line width or color saturation is used as a
days, mainly for numeric data visualization measurement of object creativeness. This method also
techniques were used. Recently, huge amount of non- provides interactions making hiding useless links and
numeric data, such as social networks came into highlighting selected one. So, this method underlays straight
existence. Analyzing and visualizing such non- connection between multiple items and shows how relative it
is [3].
numeric data attracts a lot of interest. There are many
modern visualization techniques devoted to these C. Visualization Tools
type of data. For example, multiple people on the Data visualization tools allow you to organize and present
social network tag various item such as product information intuitively. Prefuse and flare were the first
review, blog entries and pictures. Statistics of user frameworks for visualization that were used extensively. For
simple visualization of data online tool Google chart tools were
generated tags is visualized by a tag cloud. Often, in
used, many formats like bubble chart, line plots treemap can be
a tag cloud, tags are arranged in a user preferred drawn. For plotting and graphing Matplotlib was used which
order or listed alphabetically. The importance of a tag has many packegs for statistics, clustering and plotting. For
is indicated by color or font size[1]. creating graphics, processing was used widely. Many eyes is a

Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.
2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)
visualization tool created by IBM. It allows uploading a data giving the feedback to the website owner for the betterment of
set and exploring it with pre buid techniques. Spotfire is the the website.
best web client and analytical functionality tool. Qliktech may In [9] the author has used a concept for analyzing data for
be the best visualization product having interactive drill-down
examining the trend and evaluating the eco-environment
capabilities. Tableau has the excellent skill to interact with
OLAP cubes. [18]. impact of three gorges project. VC.NET and ArcIMs is the
development platform for information system. ArcSDE and
III. LITERATURE REVIEW oracle 10g are used for management and use of spatial data.
The term visualization is an evolving study area, where The author introduces method and processing and storing the
many researchers have contributed from the last few decades. data generated from cross-region, cross-department.
Various authors have proposed different techniques and Visualization helps in enhancing the data analysis and data
technologies to support data visualization. This section mining.
elaborates about how the flow of research has been carried out In [10] the author has discussed the problem in compliance
by the authors and researchers from reputed journals and management which becomes an obstacle for decision making
conferences. for effective and efficient monitoring. The person should be
In [4] the author has proposed a Sensor:Network based provided with compliance software which will help in getting
approach for storing, sharing, visualizing and analyzing data high level information about overall compliance status and
from multiple devices and to interact with each other and with low level problem regarding possible problems. The author
the end user through an open REST- based API. The author has designed a dashboard for watching the compliance which
has visualized the geographical location of the data stream avoids the obstacle and decision can be made effectively.
which when clicked pops up a tabbed window containing In [11] the author has introduced a tool named SECONDA
different associated information. which is used for analyzing both individual and grouped
In [5] the author has proposed a virtual reality platform for evolution of projects and develops belonging to a software
scientific data visualization, a tool for multi-dimensional data ecosystem, Visualization is implemented in java using
visualization using which scientist can interact with the data JFREECHART libraries. The author has used GNOME
and their colleagues in the same space. The author has mapped ecosystem for studying, under SECONDA. It offers a
data parameters in different data points, shapes, size, colors, dashboard for fast visual analysis of local and global matrixes
XYZ axis and many more. The author has used iViz a that can be extracted from information stored in the
visualization tool which can be run as a standalone application repositories.
or in a web browser. In [12] the author has proposed a system for monitoring the
The author has discussed about a framework of financial user exercising progress and presenting exercise parameters in
time series delivery and visualization which can be used in relation to prescribed targets. This system can be used for
viewing the historical price movement of a stock [6]. monitoring the intensity of the levels recommended by the
Specialized binary tree (SB- tree) is used for representing the patients care provider. It uses a miniature wireless 3-axis
financial time series. Time series data server, SB-tree server acceleration tied on the wrist of the patient that transmits
and web service contains is the three major components which acceleration data. The dashboard allows graphical
are distributed on different machines. The system can reduce visualization of exercise progress in real time.
data volume and can capture the critical points. The author introduces a system where the huge amount of
In [7] the author has proposed a dashboard for displaying data generated from the collaborative software development
data used for communicating and finding trends in laboratory tool during the lifecycle of a project can be used to analyze the
operation. System is based on .NET scripts, SQL repository. performance of the individual member, or a team or manager.
The author depicts that data is collected from the multiple [13] They can analyze from different perspectives across
sources like admin, internet and user portal and are stored in different dimensions and visualized in different ways.
database using XML layer, Adobe flash, Action Script, etc. In [14] the author has proposed a dashboard which is an
Data is being visualized which is used for laboratory and staff integration, validation and visualizing tool for natural language
management. processing. The system helps the system integration team to
integrate and validate the system; developers to profile each
In [8] the author has used a concept of visual web mining
module and researchers to evaluate and compare the module
for analyzing the web data. A tool named WET is been used with the earlier versions. It also supports execution of modules
for visualization which provides a set of visual metaphor that on heterogeneous platform with an easy to use graphical
represents the structure of the websites. The websites interface developed using eclipse RCP.
exploration tool is used for exploring the websites and for

Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.
2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)
IV. PROPOSED METHODOLOGY C. Data Transfer
The data visualization mainly focuses on analyzing the data HSSFWorkbook are used for storing he FileInputStream
and presenting it to the end user. The main goal of provided by the user for transforming the attribute name
visualization is to relate information clearly and effectively present in the data set. The names of the attributes may not be
through graphical means. in proper format. For example, Order Id will be written as
We are proposing a system which will analyze and OrId which may create confusion.
visualize sales data. The data will be graphed on different
D. Database
parameters for different perspectives. Data mining process
will be applied to discover patterns for future predictions. Data After the above processing, data will be imported in the
set of one of the store from USA is taken for analysis and database. The database will contain appropriate data relevant
visualization. Data set contains various attributes such as order to the user in the proper format.
ID, Order Date, Order priority, Sales, Customer Name, region, E. Cache
Product Name, Product Category and so on. The transition
Frequent accessed data is extracted from the database and
diagram for the system is depicted in the above figure 1,
stored in the cache. When the request for accessing the same
where the transitions carried among end users, system and
data is placed then the data is extracted from the cache instead
database are introduced. The process from the user logged in,
of database which will decrease the time required.
visualization till user logged out is briefly depicted.
Before importing the data into the database, dataset is being F. Visualization
processed through certain parameters by following functions
The data is visualized depending upon the time duration
as depicted in figure 2.
provided by the end user. Top customers, sales per region,
A. Data Parser top products, no. of customers visited can be visualized.
Data set has multiple entries which may be relevant to the Using this visualization, end user can make decisions such
user or may not be relevant. So, parsing will be done in java as launching of new products; decisions for revenue
using java.util.Iterator class to check the attributes present in generation will be made.
the data set.

Figure 2: Flow diagram for data processing carried out by the system

V. RESULTS
The data set given to the system contains multiple attributes
from which few attributes may not be relevant to the end user
as depicted in Figure 3. Therefore we need to clean the dataset
Fig 1. Transition diagram for the system
and extract only the relevant attributes from the dataset before
storing it in the database. Data is processed under certain
B. Data Cleaner
functions such as parsing, cleaning and transformation. Figure
The data set may contain information which may not 4 depicts the data after processing through the functions which
be useful to the user. Such data will be deleted and cleaned is then stored in the database. After data processing, data is to
from the dataset so that only the relevant data will be be visualized which may help the end user in making
processed further which will decrease the space and time decisions.
complexity.

Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.
2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)
Figure 6 represents profit on sales for particular
products sub category in the product category. From the figure
we can say that technology product category is stable whereas
in office supplies product category sales of Binders and
Binders accessories are maximum.

Figure 3: Dataset containing multiple attributes which are not relevant to the
user.

Figure 6: Profit on particular product.

VI. CONCLUSION
In this paper we have reviewed different techniques methods
and tools which have some shortcomings of their own. We
have discussed many paper from which we got a broad idea
about a system which is required in today’s world for analysis
and visualizing the sales data using which the investors and
owners of the organization can make proper decision and
generate revenue.
We have proposed a system when data is imported and stored
Figure 4: Data stored in database after data processing. in database after processing it. This data is visualized with
different parameters and dimensions. Using which the end
Figure 5 represents the sales of the product by user can make decision, predict the future sales, calculate
product category in the particular region. Using this graph we regional sales and increase the production dependencies on the
can easily identify the maximum sales in the region. demand.
In future work we will use many different advanced
techniques for visualization and place all the graphs and charts
on a single dashboard which would help user in making
decision and generating revenue at a glance.
ACKNOWLEDGMENT
The author is highly gratified to her respected guide Prof.
R. D. Wajgi for admirable guidance and support to complete
this paper. Author is also thankful to Project Development Lab,
Department of Computer Technology, Yeshwantrao Chavan
College of Engineering, Nagpur (India) for providing essential
facilities to complete this manuscript in present nature. The
author would like to thank Prof. R. D. Wajgi (Lecturer, Dept.
of CT) and family members for financial and moral supports
throughout their technical education.
Figure 5: Sales of product in particular region.

Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.
2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)
REFERENCES [13] Fleni Stroulia, Isaac Matichuk, Fabio Rocha, Ken Baver,
“Interactive Exploration of collaboration software development
[1] Jiawei Han, Micheline Kamber and Jian Pei, “Data Mining
data,” in IEEE International Conference on Software
Concepts and Techniques”,Third edition, MK Publications,
Maintenance (ICSM), 2013.
2009.
[14] Pavan Kumar, Rashid Ahmad, B.D. Chaudary and Mukul K.
[2] M. Tennekes and E. de Jonge, “Top-down Data Analysis with
Sinha, “Enriched dashboard:-An Integration and Visualization
Treemaps,” in Proceedings of the International Conference on
Tool for Distributed NLP System on Heterogeneous Platform,”
Information Visualization Theory and Applications (IVAPP'11),
in International Conference on Computer Science and its
pp. 236–241, March 2011.
Applications (ICCSA), 2013.
[3] P. Hoek, “Parallel Arc Diagrams: Visualizing Temporal
[15] Ma Xin-hui, Wu Bing-fang, Zhu Liang, etc. “Study on Methods
Interactions,” Journal of Social Structure, vol. 12, 2011.
of Hydro-environment Data Visualization for Three Gorges
[4] Vipul Gupta, Arshan Porsohi and Poornaprajna Udupi, “Sensor Project” [J], Remote sensing informatics, 2007, pp. 28-31.
Network: An Open Data Exchange for the Web of Things,” in
[16] M. Lungu, M. Lanza, T. GˆÕrba, and R. Robbes, “The small
Proceedings of 8th IEEE Conference on Pervasive Computing
project observatory: Visualizing software ecosystems,” Science
and Communication Workshop (PERCOM), 2010.
of Computer Programming, vol. 75, no. 4, pp. 264 – 275, 2010.
[5] Ciro Donalek, S.G. Djorgovski, Alex cioc, Anwell Wang, Jerry
Zhang, Elizabeth Lawler, Stacy Yeh, Ashish Mahbal, Matthew [17] R. Chang, M. Ghoniem, R. Kosara, W. Ribarsky, J. Yang, E.
Graham, Andrew, Drake Scott Davidoff and Jeffrey S. Norris, Suma, C. Ziemkiewicz, D. Kern, and A. Sudjianto, Wire-vis:
“Immersive and Collaboration Data visualization using Virtual Visualization of categorical, time-varying data from financial
Reality Platforms,” in Proceedings of IEEE International transactions. In IEEE Symposium on Visual Analytics Science
Conference on Big Data, 2014. and Technology, pages 155–162, 2007.
[6] Tak-chung Fu, Fu-lai Chung and Chun-fai Lam, “Adaptive Data [18] “The 37 Best Tools for Visualization,”
Delivery Framework for Financial Time Series Visualization,” http://www.creativebloq.com/design-tools/data-visualization-
in Proceedings of IEEE International Conference on Mobile 712402.Harjeet Kaur, Varsha Sahni, and Dr. Manju Bala, “A
Business (ICMB), pp. 267-273, 2005. survey of reactive, proactive, and hybrid routing protocols in
MANET: a review”, International Journal of Computer Science
[7] Eric Martin and Vincenzo Di Bernardo, “Enterprise Dashboard
and Information Technologies, vol. 4, no. 3, pp. 498-500, 2013.
Tools for Management of Share-use University Laboratory,” in
Proceedings of University Conveinment Industry Micro
(UCIM), 2008.
[8] Victor Pascual-cid, “An Information Visualization System for
the Understanding of Web Data,” in Proceedings of IEEE
Symposium on Information Visualization (INFOVIS), 2008.
[9] Liang Zhu, Bing-Fang, Yue-min Zhou, Xin-hui ma and Lei-
dong yang, “Researches on Eco-environment Data Visualization
for Three Gorgeous Project,” in Proceedings of IEEE
International Symposium on Information Engineering Electronic
Commerce (IEEC), 2009.
[10] Thorben sander, Malthias kehlenbeeck and Michael H.
Breithner, “Visualization of Automated Compliance Monitoring
and Repairing,” in Proceedings of IEEE Database and Expert
System Applications (DEXA), 2012.
[11] Javier Perez, Romuald Deshayes, Methieu Goeminne and Tom
Mens, “SECONDA: Software Ecosystem Analysis Dashboard,”
in International Conference on Software Maintenance and
Reengineering (CSMR), 2010.
[12] Cheol Jeong and Joseph Finkelstein, “Computer-Assisted Upper
Extremity Training Using Interactive Biking Exercise (iBikE)
Platform,” in Proceedings of IEEE Conference on Engineering
in Medicine and Biological Society (EMBC), 2012.

Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.

You might also like