Synopsis "Time Series Geospatial Big Data Analysis Using Array Database"
Synopsis "Time Series Geospatial Big Data Analysis Using Array Database"
ON
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
SUBMITTED BY
Jayati Gandhi
Literature survey:-
1). Yangming JIANG and Siwen BI, (2008) “Dynamic Object-Oriented Model and its
Applications for Digital Earth, Digital Earth Summit on Geoinformatics”, Nov, 12-14,
2008, Germany.
This paper has nominated a dynamic object oriented model which is deployed to trigger
changes in digital earth. A dynamic object oriented model which is regarded
spatiotemporal class as a base class of four classes – ZeroTObject (ZTO),
OneTObject(OTO), TwoTObject (TTO), ThreeTObject (TTHTO) where as ZTO is a
temporal node, OTO is a temporal arc, TTO is a temporal polygon, and THTO is a
temporal cube. This model is deployed to trigger changes in digital earth.
2) “An Approach for Assessing Array DBMSs for Geospatial Raster Data” by Jane
Kovanen, Ville Makinen, and Tapani SarjakoskRO, GEO Processing 2018: The Tenth
International Conference on Advanced Geographic Information Systems, Applications,
and Service
In this paper, an approach that can be used to assess the capabilities of Array Database
Management Systems (DBMSs) regarding the management and processing of raster data.
The paper presents a framework that can be used to compare the functionalities of Array
DBMSs and benchmark them. The main feature of the framework is assessing
functionality using both targeted test cases and benchmarking. This assessment is
followed by leveraging the gained experiences to assess non functionality using
characteristics from existing quality models. The framework can be extended by further
DBMSs, benchmarks and additional hardware resources. The assessment was first
implemented for the community editions of SciDB and Rasdaman. The study presents
some key initial observations regarding the particular Array DBMSs.
4) “Evaluating the Open Source Data Containers for Handling Big Geospatial Raster
Data” by Fei Hu and Mengchao Xu.
This paper provides a comprehensive evaluation of six popular data containers (i.e.,
Rasdaman, SciDB, Spark, Climate Spark, Hive, and MongoDB) for handling multi-
dimensional, array-based geospatial raster datasets. Their architectures, technologies,
capabilities, and performance are compared and evaluated from two perspectives: (a)
system design and architecture (distributed architecture, logical data model, physical data
model, and data operations); and (b) practical use experience and performance (data
preprocessing, data uploading, query speed, and resource consumption). Four major
conclusions are offered: (1) no data containers, except Climate Spark, have good support
for the HDF data format used in this paper, requiring time- and resource-consuming data
preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small/mediate
volumes of data query well, whereas Spark and Climate Spark can handle large volumes
of data with stable resource consumption; (3) SciDB and Rasdaman provide mature
array-based data operation and analytical functions, while the others lack these functions
for users; and (4) SciDB, Spark, and Hive have better support of user defined functions
(UDFs) to extend the system capability.
5) The Australian Geosciences Data Cube — Foundations and lessons learned by Adam
Lewis and Simon Oliver.
The Australian Geoscience Data Cube (AGDC) aims to realize the full potential of Earth
observation data holdings by addressing the Big Data challenges of volume, velocity, and
variety that otherwise limit the usefulness of Earth observation data. There have been
several iterations and AGDC version 2 is a major advance on previous work. The
foundations and core components of the AGDC are: (1) data preparation, including
geometric and radiometric corrections to Earth observation data to produce standardized
surface reflectance measurements that support time-series analysis, and
collection management systems which track the provenance of each Data Cube product
and formalize re-processing decisions; (2) the software environment used to manage and
interact with the data; and (3) the supporting high performance computing environment
provided by the Australian National Computational Infrastructure (NCI).
A growing number of examples demonstrate that our data cube approach allows analysts
to extract rich new information from Earth observation time series, including through
new methods that draw on the full spatial and temporal coverage of the Earth observation
archives. To enable easy-uptake of the AGDC, and to facilitate future cooperative
development, our code is developed under an open-source, Apache License, Version 2.0.
This open-source approach is enabling other organizations, including the Committee on
Earth Observing Satellites (CEOS), to explore the use of similar data cubes in developing
countries.
Problem identification:-
Traditional storage for EO data uses various kinds of files, such as Network Common
DataForm (NetCDF) for atmospheric and hydrological sciences, GeoTIFF, and
Hierarchical Data Format (HDF) for remote sensing images. These specially-designed
data formats work quite well when the amount of data is not very large. However, issues
start to arise when data volumes increases gradually. The most obvious problem is that it
is not easy to retrieve and query the information needed. To solve this problem, an array
database is designed and implemented as a common database service offering flexible
and scalable storage and retrieval of large volumes of multidimensional array data, such
as sensor, image, simulation or statistics data. It has attracted extensive attention from
academic and industry data scientists
Methodology:-
The main aim of this project is to identify the water body area change during last 10
years. Using open source tool like Rasdaman configure a platform for geospatial data
management and analysis. After configuration download the time series satellite data of
water detection and ingest into database and execute the queries Rasql in database.
Prepare the proper Meta data for the images and store it in the database. Images are taken
from the LANDSAT 8 which is the American Earth Observation Satellite and it has 8
bands. Each band has different applications like coastal and aerosol studies, peak
vegetation detection of cloud contamination and water detection etc. The main purpose of
the bands is to monitor the earth and keep the track of changes on the planet’s surface.
After this implement different algorithms for extracting information from the time series
data. After implementation of algorithms develop a web application to query and
visualize the results.
Tools/Software Used:-
References:-
1). Yangming JIANG and Siwen BI, (2008) “Dynamic Object-Oriented Model and its
Applications for Digital Earth, Digital Earth Summit on Geoinformatics”, Nov, 12-14,
2008, Germany.
2) “An Approach for Assessing Array DBMSs for Geospatial Raster Data” by Janne
Kovanen, Ville Makinen, and Tapani SarjakoskRO, GEO Processing 2018: The Tenth
International Conference on Advanced Geographic Information Systems, Applications,
and Service
3). “Geo-Spatial Big Data Analysis: An Overview” by C. Kamali and Gethsiyal Augasta.
4). “Evaluating the Open Source Data Containers for Handling Big Geospatial Raster
Data” by Fei Hu and Mengchao Xu.
5). The Australian Geoscience Data Cube — Foundations and lessons learned by Adam
lewis and Simon Oliver