0% found this document useful (0 votes)
18 views20 pages

Madrid

Uploaded by

Bing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views20 pages

Madrid

Uploaded by

Bing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Rainfall Analysis in Madrid: Statistical Insights Using Transformed Metrics

INTRODUCTION

The analysis of historical rainfall data provides critical insights into long-term climatic

patterns and their impact on environmental and socio-economic activities. Understanding trends

in rainfall over time is essential for water resource management, agricultural planning, and

climate change adaptation. The dataset presented includes rainfall depths (in millimeters) from

1860 to 1989, offering a comprehensive overview of precipitation trends over more than a

century. Variations in rainfall across the years reflect both natural fluctuations and potential

influences of broader climatic shifts, such as global warming and regional climate changes.

Analyzing this data can help to identify patterns, anomalies, and periods of extreme

rainfall, which are essential for flood risk management, drought mitigation, and designing

sustainable agricultural practices. Additionally, it may provide context for correlating hydrological

events with other environmental factors, offering opportunities for predictive modeling and long-

term water management strategies.

OBJECTIVES

1. Identify Long-term Rainfall Trends: Analyze the dataset to determine whether there

are discernible long-term increases or decreases in rainfall depth and explore potential

factors contributing to these trends.

2. Examine Rainfall Variability: Investigate the variability of rainfall across the years,

identifying any anomalies or extreme rainfall events that may indicate periods of drought

or excessive precipitation.
3. Assess Climate Change Indicators: Use the data to assess any potential indicators of

climate change, including significant deviations from historical norms and trends that

may align with known climate events.

4. Inform Water Resource Management: Provide insights into how historical rainfall data

can be used to improve water resource management, particularly in areas vulnerable to

flooding or drought.

5. Support Agricultural Planning: Help in forecasting water availability for agricultural

purposes by identifying patterns in rainfall that could inform decisions on planting

seasons and irrigation needs.

6. Predictive Modeling: Develop models that predict future rainfall patterns based on

historical data, contributing to more accurate weather forecasting and climate resilience

strategies.

METHODOLOGY

The rainfall data for Madrid, comprising annual observations, was analyzed to

understand its statistical and probabilistic characteristics. The dataset was subjected to initial

preprocessing, which included: Any gaps in the data were identified. Although no imputation

was necessary, this step ensured completeness. Outliers were detected using the interquartile

range (IQR) method, calculated as: IQR = Q3 – Q1. Observations falling below Q1 – 1.5 x IQR

or above Q3 + 1.5 x IQR were removed to minimize skewness and retain data integrity.

To improve the interpretability and distribution symmetry of the rainfall data, the following

transformations were applied: Square Root Transformation, this reduced the influence of

extremely high values, stabilizing variance while preserving the data’s core structure. Cube Root

Transformation, the cube root was particularly useful for balancing the distribution of both high

and low rainfall observations, making it ideal for exceedance probability analysis. Logarithmic
Transformation, this transformation was applied to differentiate low rainfall values, spreading

smaller values over a wider range for enhanced clarity.

Exceedance probabilities were determined to assess the likelihood of rainfall exceeding

m
specific thresholds. The Weibull formula was used: P = . For each transformation,
n+1

exceedance probabilities were calculated, enabling comparison across rainfall metrics.

Return periods, defined as the average time interval between occurrences of events

1
exceeding specific thresholds, were derived as: T = .This provided actionable insights into the
P

recurrence of extreme rainfall events.

To convey the data’s distribution and exceedance characteristics, Bins were adjusted for

optimal visualization, balancing resolution, and clarity. Relative frequencies were computed as

percentages. Kernel density estimation was applied to smooth the distribution for visual

representation, scaled to percentages for comparability with histograms. Exceedance

probabilities were plotted against transformed rainfall values, showcasing trends for extreme

and typical events.

Descriptive metrics (mean, median, standard deviation) were calculated for raw and

transformed data to establish baselines and evaluate the effects of transformations. The

comparison highlighted the advantages of each method in specific analytical contexts.

RESULTS AND DISCUSSION

The raw dataset revealed an average annual rainfall of 426.64 mm, with a standard

deviation of 96.00 mm, indicating moderate variability. Rainfall values ranged between 258 mm

and 697 mm, capturing both typical and extreme events. The median rainfall was consistent
with the mean, emphasizing the dataset's central tendency. However, the presence of outliers

highlighted the need for transformation to stabilize variance and improve interpretability.

Transformations played a crucial role in addressing the dataset's skewness and

enhancing the visualization of rainfall distribution: Square Root Transformation, reduced the

impact of extreme rainfall values, producing a more symmetric distribution. This transformation

was particularly effective in stabilizing variance while retaining the dataset’s core structure.

Cube Root Transformation, balanced both high and low rainfall values, resulting in an evenly

spread dataset. This transformation provided the clearest and most interpretable visualizations,

especially for exceedance probabilities and return period analyses. Logarithmic Transformation,

spread smaller rainfall values over a wider range, making low-end variations more

distinguishable. However, it compressed higher values, which slightly limited its effectiveness

for analyzing extreme rainfall.

Exceedance probabilities provided valuable insights into the likelihood of surpassing

specific rainfall thresholds. High rainfall events exceeding 600 mm were rare, with probabilities

below 5%, reflecting their extreme nature. Conversely, typical rainfall values around the mean

(426 mm) had an exceedance probability of approximately 50%, confirming the dataset's central

tendency. Low rainfall values below 300 mm were highly unlikely, with probabilities close to

90%. These findings align with the climatological patterns expected for Madrid, where moderate

rainfall dominates, and extremes occur infrequently.

The return periods calculated from exceedance probabilities offered practical

implications for rainfall event recurrence: A rainfall event of 600 mm or more had a return period

of approximately 20 years, emphasizing its rarity. Typical rainfall events (e.g., 426 mm) were

expected to occur every 2 years, aligning with the dataset's moderate variability. Low rainfall

events below 300 mm were frequent, with return periods of less than a year, indicating their

commonality in the region.


The relative frequency and density plots offered complementary perspectives on the

dataset's distribution: Relative frequency histograms showed a steep decline in frequency as

rainfall values increased, underscoring the rarity of extreme events. The peaks of the

histograms consistently aligned with typical rainfall values between 400–500 mm. Density plots

revealed a smooth distribution curve across transformations, with the cube root transformation

achieving the best balance between high and low rainfall values. These plots highlighted how

transformations effectively spread the data, aiding in visual interpretation.

Among the transformations, the cube root emerged as the most effective for balancing

the dataset. It not only smoothed the distribution but also facilitated clear exceedance and return

period visualizations. The square root and logarithmic transformations, while useful, had more

specialized applications: the square root for stabilizing variance and the logarithmic for

analyzing low-end rainfall variations.

The findings emphasize the moderate variability of Madrid's rainfall and the prevalence

of typical rainfall events around 400–500 mm. Extreme events, though rare, are critical for

hydrological and urban planning. By analyzing rainfall through various transformations, the

study provided a multi-faceted understanding of data behavior, making it applicable to both

climatological research and practical applications like disaster risk reduction and resource

allocation.

CONCLUSION

The analysis of annual rainfall data for Madrid revealed critical insights into its statistical

characteristics and patterns. The dataset, with an average annual rainfall of 426.64 mm and a

standard deviation of 96.00 mm, exhibited moderate variability, indicating a relatively stable

climate with occasional extreme events. By applying transformations, particularly the cube root

and logarithmic scales, the study successfully normalized the data distribution, making it easier
to interpret rare and extreme events while preserving the integrity of the data. The square root

transformation was effective in reducing the dominance of high-end values, while the logarithmic

transformation provided a clearer view of lower rainfall magnitudes. The cube root

transformation stood out as the most balanced, effectively spreading data points across the

range and facilitating better visual and statistical analysis.

Exceedance probability calculations highlighted that extreme rainfall events above 600

mm are rare, with probabilities below 5%, while typical rainfall around the median (~426 mm)

occurs with a 50% likelihood. Return periods derived from exceedance probabilities offered

actionable insights, showing that high rainfall events exceeding 600 mm are expected

approximately once every 20 years, whereas moderate rainfall events occur more frequently.

These findings are critical for hydrological planning, flood risk assessments, and water resource

management.

The integration of relative frequency and density visualizations further illuminated the

dataset's behavior, particularly the clustering of typical rainfall values around 400–500 mm.

These graphs also demonstrated the transformations' impact on spreading and smoothing the

data distribution. This multi-faceted approach underscores the importance of applying

transformations and statistical techniques to better understand and predict climate behavior.

Overall, the study offers a robust framework for analyzing rainfall patterns, with applications in

urban planning, agriculture, and disaster risk reduction. The conclusions drawn not only

enhance our understanding of Madrid’s rainfall dynamics but also provide a methodology that

can be adapted for similar analyses in other regions.


RECOMMENDATIONS

 Utilize cube root transformations in future studies to balance rainfall data for analysis and

visualization effectively.

 Integrate additional climatic factors, such as temperature and humidity, to enrich the

contextual understanding of rainfall variability.

 Develop predictive models leveraging return period data to inform water resource

management and urban planning.

 Conduct further studies on seasonal and monthly rainfall distributions for a finer temporal

analysis.
REFERENCES

 Alexander, L. V., & Jones, P. D. (2001). Updated precipitation series for the UK and

discussion of recent extremes. Atmospheric Science Letters, 1(2), 142-150.

 Trenberth, K. E. (2011). Changes in precipitation with climate change. Climate Research,

47(1-2), 123-138.

 Kumar, V., Jain, S. K., & Singh, Y. (2010). Analysis of long-term rainfall trends in India.

Hydrological Sciences Journal, 55(4), 484-496.

 Hannaford, J., & Marsh, T. J. (2006). An assessment of trends in UK runoff and low flows

using a network of undisturbed catchments. International Journal of Climatology: A Journal

of the Royal Meteorological Society, 26(9), 1237-1253.

 Huntington, T. G. (2006). Evidence for intensification of the global water cycle: Review and

synthesis. Journal of Hydrology, 319(1-4), 83-95.

 Madsen, H., Lawrence, D., Lang, M., Martinkova, M., & Kjeldsen, T. R. (2014). Review of

trend analysis and climate change projections of extreme precipitation and floods in Europe.

Journal of Hydrology, 519, 3634-3650.

 Zhang, X., Zwiers, F. W., Hegerl, G. C., & Francis, W. (2007). Detection of human

influence on twentieth-century precipitation trends. Nature, 448(7152), 461-465.

 Rajeevan, M., Bhate, J., & Kale, J. D. (2006). High resolution daily gridded rainfall data for

the Indian region: Analysis of break and active monsoon spells. Current Science, 91(3), 296-

306.
APPENDIX A: R CODE

library(tidyverse)
library(ggpubr)
library(rstatix)
library(car)
library(broom)

MADRID <- read.csv("C:/Users/Cherry mae/Downloads/Madrid-125-obs.csv")

####
#REMOVE OUTLIERS

# Load the necessary library


library(dplyr)

# Assuming your dataset is already loaded as MADRID


# Calculate Q1, Q3, and IQR
Q1 <- quantile(MADRID$observation, 0.25)
Q3 <- quantile(MADRID$observation, 0.75)
IQR <- Q3 - Q1

# Define lower and upper bounds


lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR

# Filter the dataset


MADRID2 <- MADRID %>%
filter(observation >= lower_bound & observation <= upper_bound)

# View the cleaned dataset


print(MADRID2)

#########
#MEAN AND SD

# Calculate mean and standard deviation


MEAN <- mean(MADRID2$observation)
SD <- sd(MADRID2$observation)

# Print the results


cat("Mean of observations:", MEAN, "\n")
cat("Standard deviation of observations:", SD, "\n")

#######
#RANKING

# Rank the data in descending order


RANKED <- MADRID2 %>%
arrange(desc(observation))

# View the ranked dataset


print(RANKED)

##################
#PROBABILITY OF EXCEEDANCE
#wEIBULL AND GRINGORTEN

# Load necessary libraries


library(dplyr)

# Assuming cleaned_data has been ranked


ranked_data <- MADRID2 %>%
arrange(desc(observation))

# Calculate number of observations


n <- nrow(ranked_data)

# Calculate Weibull and Gringorten probabilities of exceedance


ranked_data <- ranked_data %>%
mutate(
rank = row_number(),
Weibull_P = 1 - (rank / (n + 1)),
Gringorten_P = rank / (n + 1)
)

# View the updated ranked dataset with probabilities


print(ranked_data)

######
#PLOT
#WEIBULL AND GRINGORTEN

# Load the ggplot2 library


library(ggplot2)

# Plot the probabilities of exceedance


ggplot(ranked_data, aes(x = observation)) +
geom_line(aes(y = Weibull_P, color = "Weibull"), size = 1) +
geom_line(aes(y = Gringorten_P, color = "Gringorten"), size = 1) +
scale_y_continuous(labels = scales::percent) + # Convert y-axis to percentage
labs(
title = "MADRID - Total Rainfall",
x = "Rainfall Depth (mm)",
y = "Probability of Exceedance (%)",
color = "Method"
)+
theme_minimal() +
theme(legend.position = "right")

#################
#PROBABILITY OF EXCEEDANCE

# Load necessary libraries


library(dplyr)

# Assuming MADRID2 is already loaded as a data frame


# Calculate exceedance probability
MADRID2 <- MADRID2 %>%
arrange(observation) %>% # Sort the observations in ascending order
mutate(exceedance_prob = (n() - row_number() + 1) / n() * 100) # Calculate exceedance probability

# View the updated dataset with exceedance probabilities


print(MADRID2)

# Load necessary libraries


library(ggplot2)
library(dplyr)

# Assuming MADRID2 is already loaded as a data frame


# Calculate exceedance probability
MADRID2 <- MADRID2 %>%
arrange(observation) %>% # Sort the observations
mutate(rank = row_number(),
exceedance_prob = (n() - rank + 1) / n() * 100) # Calculate exceedance probability

# Create the plot


ggplot(MADRID2, aes(x = observation, y = exceedance_prob)) +
geom_line() +
geom_point() +
scale_y_continuous(name = "Probability of Exceedance (%)") +
scale_x_continuous(name = "Rainfall Depth (mm)") +
ggtitle("MADRID - Total Rainfall") +
theme_minimal()

#####################
#RETURN PERIOD

# Load necessary libraries


library(dplyr)

# Assuming MADRID2 is already loaded as a data frame


# Calculate return period
MADRID2 <- MADRID2 %>%
arrange(observation) %>% # Sort the observations in ascending order
mutate(rank = row_number(),
return_period = (n() + 1) / rank) # Calculate return period

# View the updated dataset with return period


print(MADRID2)

####################
#RELATIVE FREQ AND DENSITY VS. RAINFALL

# Load necessary libraries


library(ggplot2)
library(dplyr)

# Assuming MADRID2 is already loaded as a data frame


# Calculate relative frequency
relative_freq <- MADRID2 %>%
group_by(observation) %>%
summarise(count = n()) %>%
mutate(relative_frequency = (count / sum(count)) * 100) # Convert to percentage

# Calculate density
density_data <- density(MADRID2$observation, na.rm = TRUE)

# Create a data frame for density


density_df <- data.frame(
observation = density_data$x,
density = density_data$y * 100 # Convert to percentage
)

# Create the plot


ggplot() +
geom_bar(data = relative_freq, aes(x = observation, y = relative_frequency),
stat = "identity", fill = "blue", alpha = 0.5) + # Relative frequency
geom_line(data = density_df, aes(x = observation, y = density),
color = "red", size = 1) + # Density
scale_y_continuous(name = "Relative Frequency (%)", sec.axis = sec_axis(~., name = "Density (%)")) +
scale_x_continuous(name = "Rainfall Depth (mm)") +
ggtitle("Density and Relative Frequency vs Rainfall Depth") +
theme_minimal()

####################
#PROBABILITY
#RETURN PERIOD
#EVENTS

# Load necessary libraries


library(dplyr)

# Assuming MADRID2 is already loaded as a data frame


# Step 1: Calculate Exceedance Values and Return Periods
exceedance_probs <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# Calculate return period and corresponding events
exceedance_results <- data.frame(Probability = exceedance_probs) %>%
mutate(Return_Period = 100 / Probability) %>% # Calculate return period
arrange(Probability) %>%
rowwise() %>%
mutate(Event = MADRID2$observation[
max(1, round((nrow(MADRID2) * (1 - Probability / 100)))) # Ensure index starts from 1
])

# View the results


print(exceedance_results)

################
#SQRT TRANSFORMATION
#EXCEEDANCE VS SQRT RAINFALL

# Load necessary libraries


library(dplyr)
library(ggplot2)

# Assuming MADRID2 is already loaded and includes the square root transformed data
# Step 1: Apply square root transformation if not done already
MADRID2 <- MADRID2 %>%
mutate(SQRT_Observation = sqrt(observation))

# Step 2: Sort the transformed data


sorted_data <- MADRID2 %>%
arrange(SQRT_Observation)

# Step 3: Calculate the probability of exceedance


n <- nrow(sorted_data) # Total number of observations

# Add probability of exceedance to the sorted data


exceedance_probabilities <- sorted_data %>%
mutate(Probability = (1 - (row_number() - 1) / n) * 100) # Calculate exceedance probability

# Step 4: Plot Exceedance vs. Square Root Transformed Rainfall


ggplot(exceedance_probabilities, aes(x = SQRT_Observation, y = Probability)) +
geom_line(color = "blue") + # Line for exceedance probability
geom_point(color = "red") + # Points for individual data
scale_x_continuous(limits = c(15, 30), breaks = seq(15, 30, by = 5)) + # Set x-axis range from 0 to 30
scale_y_reverse(limits = c(100, 0), breaks = seq(0, 100, by = 10)) + # Set y-axis range from 0 to 100
labs(title = "MADRID - Total Rainfall",
x = "Square Root Transformed Rainfall (mm)",
y = "Probability of Exceedance (%)") +
theme_minimal() # A cleaner theme

#######
#FREQ VS SQRT RAINFALL
# Load necessary libraries
library(dplyr)
library(ggplot2)

# Assuming MADRID2 is already loaded and includes the square root transformed data
# Step 1: Apply square root transformation if not done already
MADRID2 <- MADRID2 %>%
mutate(SQRT_Observation = sqrt(observation))

# Step 2: Calculate Relative Frequency


# This step will not be necessary for histogram as it will be calculated directly in geom_histogram
# relative_frequency <- MADRID2 %>%
# group_by(SQRT_Observation) %>%
# summarise(Frequency = n()) %>%
# mutate(Relative_Frequency = Frequency / sum(Frequency) * 100)

# Step 3: Create Histogram with Adjusted Bin Width


ggplot(MADRID2, aes(x = SQRT_Observation)) +
geom_histogram(aes(y = ..density.. * 100), # Convert density to percentage
binwidth = 0.5, # Adjust the bin width here
fill = "lightblue",
alpha = 0.5,
color = "black") + # Outline color for the bars
geom_density(aes(y = ..density.. * 100), # Convert density to percentage
color = "blue", size = 1) + # Overlay density line
labs(title = "MADRID - Total Rainfall",
x = "Square Root Transformed Rainfall (mm)",
y = "Relative Frequency (%)") +
theme_minimal() # A cleaner theme

#######
#CUBE ROOT
#EXCEDENCE VS CUBE ROOT RAINFALL

# Load necessary libraries


library(dplyr)
library(ggplot2)

# Assuming MADRID2 is already loaded


# Step 1: Apply cube root transformation
MADRID2 <- MADRID2 %>%
mutate(CUBE_ROOT_Observation = observation^(1/3)) # Cube root transformation

# Step 2: Sort the transformed data


sorted_data <- MADRID2 %>%
arrange(CUBE_ROOT_Observation)

# Step 3: Calculate the probability of exceedance


n <- nrow(sorted_data) # Total number of observations

# Add exceedance probability to the sorted data


exceedance_probabilities <- sorted_data %>%
mutate(Exceedance_Probability = (1 - (row_number() - 1) / n) * 100) # Calculate exceedance
probability

# Step 4: Plot Exceedance vs. Cube Root Transformed Rainfall


ggplot(exceedance_probabilities, aes(x = CUBE_ROOT_Observation, y = Exceedance_Probability)) +
geom_line(color = "blue") + # Line for exceedance probability
geom_point(color = "red") + # Points for individual data
scale_y_reverse(limits = c(100, 0), breaks = seq(0, 100, by = 10)) + # Reverse y-axis from 100 to 0
labs(title = "MADRID - Total Rainfall",
x = "Cube Root Transformed Rainfall (mm)",
y = "Probability of Exceedance (%)") +
theme_minimal() # A cleaner theme

########
#FREQ AND DENSITY VS. CUBE ROOT

# Load necessary libraries


library(dplyr)
library(ggplot2)

# Assuming MADRID2 is already loaded


# Step 1: Apply cube root transformation
MADRID2 <- MADRID2 %>%
mutate(CUBE_ROOT_Observation = observation^(1/3)) # Cube root transformation

# Step 2: Plot Relative Frequency and Density with Increased Bin Width
ggplot(MADRID2, aes(x = CUBE_ROOT_Observation)) +
geom_histogram(aes(y = ..count.. / sum(..count..) * 100), # Relative frequency as a percentage
binwidth = 1.0, # Increased bin width (adjust this value as needed)
fill = "lightblue",
alpha = 0.5,
color = "black") + # Outline color for the bars
geom_density(aes(y = ..density.. * 100), # Convert density to percentage
color = "blue", size = 1) + # Overlay density line
labs(title = "MADRID - Total Rainfall",
x = "Cube Root Transformed Rainfall (mm)",
y = "Relative Frequency (%)") +
scale_y_continuous(sec.axis = sec_axis(~ ., name = "Density (%)")) + # Secondary y-axis for density
theme_minimal() # A cleaner theme

#############
#LOGARITHM
#EXCEEDANCE VS LOGARITHM

# Load necessary libraries


library(dplyr)
library(ggplot2)

# Assuming MADRID2 is already loaded


# Step 1: Apply logarithmic transformation
MADRID2 <- MADRID2 %>%
mutate(LOG_Observation = log(observation)) # Logarithmic transformation

# Step 2: Sort the transformed data


sorted_data <- MADRID2 %>%
arrange(LOG_Observation)

# Step 3: Calculate the probability of exceedance


n <- nrow(sorted_data) # Total number of observations

# Add exceedance probability to the sorted data


exceedance_probabilities <- sorted_data %>%
mutate(Exceedance_Probability = (1 - (row_number() - 1) / n) * 100) # Calculate exceedance
probability

# Step 4: Plot Exceedance vs. Log Transformed Rainfall


ggplot(exceedance_probabilities, aes(x = LOG_Observation, y = Exceedance_Probability)) +
geom_line(color = "blue") + # Line for exceedance probability
geom_point(color = "red") + # Points for individual data
labs(title = "MADRID - Total Rainfall",
x = "Log Transformed Rainfall (mm)",
y = "Probability of Exceedance (%)") +
theme_minimal() # A cleaner theme

###########
#FREQ AND DENSITY VS LOGARITHM

# Load necessary libraries


library(dplyr)
library(ggplot2)

# Assuming MADRID2 is already loaded


# Step 1: Apply logarithmic transformation
MADRID2 <- MADRID2 %>%
mutate(LOG_Observation = log(observation)) # Logarithmic transformation

# Step 2: Plot Relative Frequency and Density


ggplot(MADRID2, aes(x = LOG_Observation)) +
geom_histogram(aes(y = ..count.. / sum(..count..) * 100), # Relative frequency as a percentage
binwidth = 0.1, # Adjust bin width as needed
fill = "lightblue",
alpha = 0.5,
color = "black") + # Outline color for the bars
geom_density(aes(y = ..density.. * 100), # Convert density to percentage
color = "blue", size = 1) + # Overlay density line
labs(title = "MADRID - Total Rainfall",
x = "Log Transformed Rainfall (mm)",
y = "Relative Frequency(%)") +
scale_y_continuous(sec.axis = sec_axis(~ ., name = "Density (%)")) + # Secondary y-axis for density
theme_minimal() # A cleaner theme
APPENDIX B

MADRID TOTAL RAINFALL (RAW)


MADRID RAINFALL (TRANSFORMED: SQRT)
MADRID RAINFALL (TRANSFORMED: CUBE ROOT)
MADRID RAINFALL (TRANSFORMED: LOG)

You might also like