We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11
A Machine Learning Approach for Rainfall
Estimation Integrating Heterogeneous
Data Sources abstract • Providing an accurate rainfall estimate at individual points is a challenging problem in order to mitigate risks derived from severe rainfall events, such as floods and landslides. Dense networks of sensors, named rain gauges (RGs), are typically used to obtain direct measurements of precipitation intensity in these points. These measurements are usually interpolated by using spatial interpolation methods for estimating the precipitation field over the entire area of interest. However, these methods are computationally expensive, and to improve the estimation of the variable of interest in unknown points, it is necessary to integrate further information. To overcome these issues, this work proposes a machine learning-based methodology that exploits a classifier based on ensemble methods for rainfall estimation and is able to integrate information from different remote sensing measurements. The proposed approach supplies an accurate estimate of the rainfall where RGs are not available, permits the integration of heterogeneous data sources exploiting both the high quantitative precision of RGs and the spatial pattern recognition ensured by radars and satellites, and is computationally less expensive than the interpolation methods. Introduction • 1) Three heterogeneous data sources (i.e., RGs, radar, and Meteosat) are integrated to generate more accurate estimates of rainfall events. • 2) Different classification methods are compared on a real case concerning Calabria, a southern region in Italy, and a hierarchical probabilistic ensemble approach is proposed. • 3) Different ML-based methods, pre trained only on historical data, with a widely used interpolation method in the hydrological field (i.e., KED) are compared. Literature survey • Verdin et al. [23] also adopt Bayesian estimation in order to estimate the parameters of the model; their system integrates RG observations and satellite data and adopts an interpolation technique based on the Kriging method. All these techniques are able to provide interesting results, but they require a rather delicate phase of parameters estimation of the particular model; therefore, as a side effect, usually, their flexibility and effectiveness tend to be hampered. As the relations between sensors data, cloud properties, and rainfall estimates are highly nonlinear, more flexible approaches based on ML techniques have been investigated recently. For instance, the problem of detecting convective events and closely related rainy areas is addressed in [24] by using ANNs combined with support vector machines. Data sets are obtained by processing data coming from optical channels of the multispectral instrument onboard of Meteosat Second Generation (MSG) satellites; different from our work, RG measures are used only as a reference but not in the training phase of the algorithm. Sehad et al. [25] propose an approach to rainfall estimation based on SVMs; the input data are integrated from multispectral channels on MSG; and two models are developed for daytime and nighttime respectively Existing system • An existing system is based on the ensemble paradigm include the work in [12], which, similar to our work, employs a probabilistic ensemble and merges two sources of data (i.e., rain gauges and radar) even if the aim of this work is to develop a run-off analysis. Afterward, a blending technique is applied to the results of the runoff hydrologic models to determine a single runoff hydrograph. Experimental results show that the hydrologic models are accurate and can help to make more effective decisions in the flood warning. Frei and Isotta [13] define a technique for deriving a probabilistic spatial analysis of daily precipitation from rain gauges. The final model represents an ensemble of possible fields, conditional on the observations, which can be explained as a Bayesian predictive distribution measuring the uncertainty due to the data sampling from the station network. An evaluation of a real case study, located in the European Alps, proves the capability of the approach in providing accurate predictions for a hydrological partitioning of the region. • Disadvantages of existing system • The system is not implemented hierarchical probabilistic ensemble classifier (HPEC) for rainfall prediction. • The system is implemented artificial neural networks (ANNs) as a forecasting method in which prediction is not accurate. Proposed system • Our approach is an effective solution for real scenarios, as in the case of an officer of the Department of Civil Protection (DCP), who has to analyze the rainfall in a specific zone presenting risks of landslides or floods. The experimental evaluation is conducted on real data concerning Calabria, a region located in the South of Italy, and provided by the DCP. Calabria is an effective test ground because of its strong climate variability and its complex orography. Our contributions can be summarized as follows. • 1) Three heterogeneous data sources (i.e., RGs, radar, and Meteosat) are integrated to generate more accurate estimates of rainfall events. • 2) Different classification methods are compared on a real case concerning Calabria, a southern region in Italy, and a hierarchical probabilistic ensemble approach is proposed. • 3) Different ML-based methods, pre trained only on historical data, with a widely used interpolation method in the hydrological field (i.e., KED) are compared. Advantages of proposed system • In the proposed system, raw data are preprocessed to make them suitable for the analysis, and an under sampling strategy is adopted to address the class unbalanced problem. • The proposed system developed an Effect of Integrating RG, Satellite, and Radar Measurements and are tested and trained with an effective ML Classifiers. System architecture Novelty • ungauged points (in a real usage scenario) in order to estimate the severity class of these points and • 2) a separated set of training data (i.e., not used in the learning phase). conclusion • An ML-based approach for the spatial rainfall field estimation has been defined. By integrating heterogeneous data sources, such as RGs, radars, and satellites, this methodology permits estimation of the rainfall, where RGs are not present, also exploiting the spatial pattern recognition ensured by radars and satellites. After a phase of preprocessing, a random uniform under sampling strategy is adopted, and finally, an HPEC permits the model used to be built to estimate the severity of the rainfall events. This ensemble is based on two levels: in the first level, a set of RF classifiers are trained, while, in the second level, a probabilistic metal earner is used to combine the estimated probabilities provided by the base classifiers according to a stacking schema