Data Analysis and Visualization of Sales Data
Data Analysis and Visualization of Sales Data
Abstract— Data is being generated very rapidly due to increase complex database with more unanticipated variations than
in information in everyday life. Huge amount of data get normal ones, even the domain master would find it difficult to
accumulated from various organizations that is difficult to reach useful results. In order to express better visualization of
analyze and exploit. Data created by an expanding number of results, analysis of data is needed.
sensors in the environment such as traffic cameras and satellites,
internet activities on social networking sites, healthcare database,
One of the key steps in Business Intelligence process where
government database, sales data etc., are example of huge data. data is extracted and correlated from various data sources. In
Processing, analyzing and communicating this data are a today’s globalized market most organizations have multiple
challenge. Online shopping websites get flooded with voluminous information repositories. Human Resources, Sales, Customer
amount of sales data every day. Analyzing and visualizing this Management and Marketing all have information systems for
data for information retrieval is a difficult task. Therefore a their needs. Often each of these departments has multiple
system is required which will effectively analyze and visualize databases and applications and with the adoption of SAAS
data. This paper focuses on a system which will visualize sales recently, more and more data is kept in different cloud
data which will help users in applying intelligence in business,
offerings along with some databases in premise.
revenue generation, and decision making, managing business
operation and tracking progress of tasks. Furthermore, in the real world, three other important topics
must be faced by the decision makers, which are as follows:
Index Terms— Sales data, Analysis, Visualization, Report
generation. 1) Flexibility and versatility of the visualization procedure;
2) Transparency to get at supporting evidence; and
3) The processing cost and computation speed.
Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.
2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)
visualization tool created by IBM. It allows uploading a data giving the feedback to the website owner for the betterment of
set and exploring it with pre buid techniques. Spotfire is the the website.
best web client and analytical functionality tool. Qliktech may In [9] the author has used a concept for analyzing data for
be the best visualization product having interactive drill-down
examining the trend and evaluating the eco-environment
capabilities. Tableau has the excellent skill to interact with
OLAP cubes. [18]. impact of three gorges project. VC.NET and ArcIMs is the
development platform for information system. ArcSDE and
III. LITERATURE REVIEW oracle 10g are used for management and use of spatial data.
The term visualization is an evolving study area, where The author introduces method and processing and storing the
many researchers have contributed from the last few decades. data generated from cross-region, cross-department.
Various authors have proposed different techniques and Visualization helps in enhancing the data analysis and data
technologies to support data visualization. This section mining.
elaborates about how the flow of research has been carried out In [10] the author has discussed the problem in compliance
by the authors and researchers from reputed journals and management which becomes an obstacle for decision making
conferences. for effective and efficient monitoring. The person should be
In [4] the author has proposed a Sensor:Network based provided with compliance software which will help in getting
approach for storing, sharing, visualizing and analyzing data high level information about overall compliance status and
from multiple devices and to interact with each other and with low level problem regarding possible problems. The author
the end user through an open REST- based API. The author has designed a dashboard for watching the compliance which
has visualized the geographical location of the data stream avoids the obstacle and decision can be made effectively.
which when clicked pops up a tabbed window containing In [11] the author has introduced a tool named SECONDA
different associated information. which is used for analyzing both individual and grouped
In [5] the author has proposed a virtual reality platform for evolution of projects and develops belonging to a software
scientific data visualization, a tool for multi-dimensional data ecosystem, Visualization is implemented in java using
visualization using which scientist can interact with the data JFREECHART libraries. The author has used GNOME
and their colleagues in the same space. The author has mapped ecosystem for studying, under SECONDA. It offers a
data parameters in different data points, shapes, size, colors, dashboard for fast visual analysis of local and global matrixes
XYZ axis and many more. The author has used iViz a that can be extracted from information stored in the
visualization tool which can be run as a standalone application repositories.
or in a web browser. In [12] the author has proposed a system for monitoring the
The author has discussed about a framework of financial user exercising progress and presenting exercise parameters in
time series delivery and visualization which can be used in relation to prescribed targets. This system can be used for
viewing the historical price movement of a stock [6]. monitoring the intensity of the levels recommended by the
Specialized binary tree (SB- tree) is used for representing the patients care provider. It uses a miniature wireless 3-axis
financial time series. Time series data server, SB-tree server acceleration tied on the wrist of the patient that transmits
and web service contains is the three major components which acceleration data. The dashboard allows graphical
are distributed on different machines. The system can reduce visualization of exercise progress in real time.
data volume and can capture the critical points. The author introduces a system where the huge amount of
In [7] the author has proposed a dashboard for displaying data generated from the collaborative software development
data used for communicating and finding trends in laboratory tool during the lifecycle of a project can be used to analyze the
operation. System is based on .NET scripts, SQL repository. performance of the individual member, or a team or manager.
The author depicts that data is collected from the multiple [13] They can analyze from different perspectives across
sources like admin, internet and user portal and are stored in different dimensions and visualized in different ways.
database using XML layer, Adobe flash, Action Script, etc. In [14] the author has proposed a dashboard which is an
Data is being visualized which is used for laboratory and staff integration, validation and visualizing tool for natural language
management. processing. The system helps the system integration team to
integrate and validate the system; developers to profile each
In [8] the author has used a concept of visual web mining
module and researchers to evaluate and compare the module
for analyzing the web data. A tool named WET is been used with the earlier versions. It also supports execution of modules
for visualization which provides a set of visual metaphor that on heterogeneous platform with an easy to use graphical
represents the structure of the websites. The websites interface developed using eclipse RCP.
exploration tool is used for exploring the websites and for
Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.
2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)
IV. PROPOSED METHODOLOGY C. Data Transfer
The data visualization mainly focuses on analyzing the data HSSFWorkbook are used for storing he FileInputStream
and presenting it to the end user. The main goal of provided by the user for transforming the attribute name
visualization is to relate information clearly and effectively present in the data set. The names of the attributes may not be
through graphical means. in proper format. For example, Order Id will be written as
We are proposing a system which will analyze and OrId which may create confusion.
visualize sales data. The data will be graphed on different
D. Database
parameters for different perspectives. Data mining process
will be applied to discover patterns for future predictions. Data After the above processing, data will be imported in the
set of one of the store from USA is taken for analysis and database. The database will contain appropriate data relevant
visualization. Data set contains various attributes such as order to the user in the proper format.
ID, Order Date, Order priority, Sales, Customer Name, region, E. Cache
Product Name, Product Category and so on. The transition
Frequent accessed data is extracted from the database and
diagram for the system is depicted in the above figure 1,
stored in the cache. When the request for accessing the same
where the transitions carried among end users, system and
data is placed then the data is extracted from the cache instead
database are introduced. The process from the user logged in,
of database which will decrease the time required.
visualization till user logged out is briefly depicted.
Before importing the data into the database, dataset is being F. Visualization
processed through certain parameters by following functions
The data is visualized depending upon the time duration
as depicted in figure 2.
provided by the end user. Top customers, sales per region,
A. Data Parser top products, no. of customers visited can be visualized.
Data set has multiple entries which may be relevant to the Using this visualization, end user can make decisions such
user or may not be relevant. So, parsing will be done in java as launching of new products; decisions for revenue
using java.util.Iterator class to check the attributes present in generation will be made.
the data set.
Figure 2: Flow diagram for data processing carried out by the system
V. RESULTS
The data set given to the system contains multiple attributes
from which few attributes may not be relevant to the end user
as depicted in Figure 3. Therefore we need to clean the dataset
Fig 1. Transition diagram for the system
and extract only the relevant attributes from the dataset before
storing it in the database. Data is processed under certain
B. Data Cleaner
functions such as parsing, cleaning and transformation. Figure
The data set may contain information which may not 4 depicts the data after processing through the functions which
be useful to the user. Such data will be deleted and cleaned is then stored in the database. After data processing, data is to
from the dataset so that only the relevant data will be be visualized which may help the end user in making
processed further which will decrease the space and time decisions.
complexity.
Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.
2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)
Figure 6 represents profit on sales for particular
products sub category in the product category. From the figure
we can say that technology product category is stable whereas
in office supplies product category sales of Binders and
Binders accessories are maximum.
Figure 3: Dataset containing multiple attributes which are not relevant to the
user.
VI. CONCLUSION
In this paper we have reviewed different techniques methods
and tools which have some shortcomings of their own. We
have discussed many paper from which we got a broad idea
about a system which is required in today’s world for analysis
and visualizing the sales data using which the investors and
owners of the organization can make proper decision and
generate revenue.
We have proposed a system when data is imported and stored
Figure 4: Data stored in database after data processing. in database after processing it. This data is visualized with
different parameters and dimensions. Using which the end
Figure 5 represents the sales of the product by user can make decision, predict the future sales, calculate
product category in the particular region. Using this graph we regional sales and increase the production dependencies on the
can easily identify the maximum sales in the region. demand.
In future work we will use many different advanced
techniques for visualization and place all the graphs and charts
on a single dashboard which would help user in making
decision and generating revenue at a glance.
ACKNOWLEDGMENT
The author is highly gratified to her respected guide Prof.
R. D. Wajgi for admirable guidance and support to complete
this paper. Author is also thankful to Project Development Lab,
Department of Computer Technology, Yeshwantrao Chavan
College of Engineering, Nagpur (India) for providing essential
facilities to complete this manuscript in present nature. The
author would like to thank Prof. R. D. Wajgi (Lecturer, Dept.
of CT) and family members for financial and moral supports
throughout their technical education.
Figure 5: Sales of product in particular region.
Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.
2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)
REFERENCES [13] Fleni Stroulia, Isaac Matichuk, Fabio Rocha, Ken Baver,
“Interactive Exploration of collaboration software development
[1] Jiawei Han, Micheline Kamber and Jian Pei, “Data Mining
data,” in IEEE International Conference on Software
Concepts and Techniques”,Third edition, MK Publications,
Maintenance (ICSM), 2013.
2009.
[14] Pavan Kumar, Rashid Ahmad, B.D. Chaudary and Mukul K.
[2] M. Tennekes and E. de Jonge, “Top-down Data Analysis with
Sinha, “Enriched dashboard:-An Integration and Visualization
Treemaps,” in Proceedings of the International Conference on
Tool for Distributed NLP System on Heterogeneous Platform,”
Information Visualization Theory and Applications (IVAPP'11),
in International Conference on Computer Science and its
pp. 236–241, March 2011.
Applications (ICCSA), 2013.
[3] P. Hoek, “Parallel Arc Diagrams: Visualizing Temporal
[15] Ma Xin-hui, Wu Bing-fang, Zhu Liang, etc. “Study on Methods
Interactions,” Journal of Social Structure, vol. 12, 2011.
of Hydro-environment Data Visualization for Three Gorges
[4] Vipul Gupta, Arshan Porsohi and Poornaprajna Udupi, “Sensor Project” [J], Remote sensing informatics, 2007, pp. 28-31.
Network: An Open Data Exchange for the Web of Things,” in
[16] M. Lungu, M. Lanza, T. GˆÕrba, and R. Robbes, “The small
Proceedings of 8th IEEE Conference on Pervasive Computing
project observatory: Visualizing software ecosystems,” Science
and Communication Workshop (PERCOM), 2010.
of Computer Programming, vol. 75, no. 4, pp. 264 – 275, 2010.
[5] Ciro Donalek, S.G. Djorgovski, Alex cioc, Anwell Wang, Jerry
Zhang, Elizabeth Lawler, Stacy Yeh, Ashish Mahbal, Matthew [17] R. Chang, M. Ghoniem, R. Kosara, W. Ribarsky, J. Yang, E.
Graham, Andrew, Drake Scott Davidoff and Jeffrey S. Norris, Suma, C. Ziemkiewicz, D. Kern, and A. Sudjianto, Wire-vis:
“Immersive and Collaboration Data visualization using Virtual Visualization of categorical, time-varying data from financial
Reality Platforms,” in Proceedings of IEEE International transactions. In IEEE Symposium on Visual Analytics Science
Conference on Big Data, 2014. and Technology, pages 155–162, 2007.
[6] Tak-chung Fu, Fu-lai Chung and Chun-fai Lam, “Adaptive Data [18] “The 37 Best Tools for Visualization,”
Delivery Framework for Financial Time Series Visualization,” http://www.creativebloq.com/design-tools/data-visualization-
in Proceedings of IEEE International Conference on Mobile 712402.Harjeet Kaur, Varsha Sahni, and Dr. Manju Bala, “A
Business (ICMB), pp. 267-273, 2005. survey of reactive, proactive, and hybrid routing protocols in
MANET: a review”, International Journal of Computer Science
[7] Eric Martin and Vincenzo Di Bernardo, “Enterprise Dashboard
and Information Technologies, vol. 4, no. 3, pp. 498-500, 2013.
Tools for Management of Share-use University Laboratory,” in
Proceedings of University Conveinment Industry Micro
(UCIM), 2008.
[8] Victor Pascual-cid, “An Information Visualization System for
the Understanding of Web Data,” in Proceedings of IEEE
Symposium on Information Visualization (INFOVIS), 2008.
[9] Liang Zhu, Bing-Fang, Yue-min Zhou, Xin-hui ma and Lei-
dong yang, “Researches on Eco-environment Data Visualization
for Three Gorgeous Project,” in Proceedings of IEEE
International Symposium on Information Engineering Electronic
Commerce (IEEC), 2009.
[10] Thorben sander, Malthias kehlenbeeck and Michael H.
Breithner, “Visualization of Automated Compliance Monitoring
and Repairing,” in Proceedings of IEEE Database and Expert
System Applications (DEXA), 2012.
[11] Javier Perez, Romuald Deshayes, Methieu Goeminne and Tom
Mens, “SECONDA: Software Ecosystem Analysis Dashboard,”
in International Conference on Software Maintenance and
Reengineering (CSMR), 2010.
[12] Cheol Jeong and Joseph Finkelstein, “Computer-Assisted Upper
Extremity Training Using Interactive Biking Exercise (iBikE)
Platform,” in Proceedings of IEEE Conference on Engineering
in Medicine and Biological Society (EMBC), 2012.
Authorized licensed use limited to: Thapar Institute of Engineering & Technology. Downloaded on April 12,2024 at 15:27:31 UTC from IEEE Xplore. Restrictions apply.