0% found this document useful (0 votes)
30 views7 pages

Deep Learning Based Last Mile Deliveries - RS

The document discusses the application of deep learning techniques, particularly CNN architectures like AgriSegNet and UNet, for optimizing last mile deliveries in agriculture by accurately identifying and segmenting field boundaries. It highlights the challenges of manual surveying for large farms and proposes a more efficient, automated approach using geospatial technology, achieving high accuracy rates. The study emphasizes the need for generalized models that can be applied across various regions to enhance precision farming solutions.

Uploaded by

arbazsayed105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views7 pages

Deep Learning Based Last Mile Deliveries - RS

The document discusses the application of deep learning techniques, particularly CNN architectures like AgriSegNet and UNet, for optimizing last mile deliveries in agriculture by accurately identifying and segmenting field boundaries. It highlights the challenges of manual surveying for large farms and proposes a more efficient, automated approach using geospatial technology, achieving high accuracy rates. The study emphasizes the need for generalized models that can be applied across various regions to enhance precision farming solutions.

Uploaded by

arbazsayed105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Deep Learning Based Last Mile Deliveries

Optimization

Arbaz Sayed
Symbiosis Institute of Geoinformatics
Symbiosis International
Pune, India
[email protected]

Abstract— Agriculture Stakeholders need to have a precise In this regard, many researchers have worked on the
understanding of the field boundaries. This is a required application of deep learning architectures for field
resource for farm owners to onboard farms on modern segmentation. [1] approached the boundary detection
farming software programs, it enhances multiple cropping
problem with a very robust approach. Researchers
categorization accuracies and it is used by government agencies
to track incentives and food production, along with many other
successfully identified the croplands but didn’t classify them
uses. The awareness of the location of fields on the farms is of based on crop types. In circumstances when the class-based
great importance to a farmer. Even though this can be done for pixel frequency is smaller than the visual span, AgriSegNet’s
small farm holders by manually surveying the field but it is a [2], [3] multi-scale attention-based learning offers correct
very human resource-intensive and time-consuming task for predictions. The model received a 67 % F1 score. Two
huge farms. Knowing the partition of the fields by their serial fractal ResUNet models were developed [7], [8] which is
number or name can significantly improve the management of based on information from Southern Africa (Source to
fields especially when it is reaping season. Also, the boundary Target), and the other based on data from Australia (Target to
shapefiles can be integrated with a precision farming platform
Target). But these models were highly localized i.e., they
through which farmers can keep a track of all the activities
taking place in all their fields, which was not possible before. perform good just for the study area. Hence, there is a need
The current study uses modern state-of-the-art techniques for a generalized model which can give outputs for all
integrated with geospatial technology which helps to reduce the regions. [9], [10] used 3-channel and 4-channel images for
time required for the same process by 95%. The deep learning detecting the boundaries. The 3 channel consists of RGB
approach helped to achieve 93% accuracy over the manually bands and the 4 channel consists of RGB and NIR bands.
drawn fields. The study achieved a better result from the 4-channel
imagery than the 3-channel one. Whereas, [11], [12]
Keywords— Geospatial Technology, Earth Observation, Deep
combined their model with an FCNN model based on
Learning, Semantic Segmentation, Convolutional Neural
Network.
semantic segmentation. The results of multi-resolution
processing with tied weights and late score fusion were much
I. INTRODUCTION superior to those of a single-resolution network.
The last mile of retail delivery and ecommerce fulfillment [13] developed a mode for detecting overlapping
indicates the final leg of the supply chain, where goods are boundaries and boundaries incorrectly classified due to the
transported from the distribution hub to the customer's same crop types existing next to each other. In another
doorstep. This stage of the process is considered one of the research [14], the model was inspired by fully convolutional
most expensive and logistically challenging operations in the neural networks [4], but with additional deep supervision on
entire supply chain. It involves numerous complexities, top of trimmed VGG nets [15]. They found that the
including the coordination of multiple delivery drivers, the weighted-fusion layer output gives the best performance in
optimization of delivery routes, and the timely delivery of the F1 score suggesting that Heuristic Edge Detection is
goods to customers. better than other deep learning-based methods. However,
Due to the high costs and potential for inefficiencies, the [16] found no evidence of overfitting in the data. For the
last mile of delivery and ecommerce fulfillment is an area of French dataset, there is a decrease in the precision value with
constant focus for businesses looking to optimize their a corresponding improvement of the recall which increases
supply chain operations. One key strategy for improving the the f1-score. However, the models [14], [15], [16] did not
efficiency of the last mile is route optimization, which give a speedy prediction when the input image was of low
involves finding the most efficient routes for delivery drivers spatial resolution.
to follow to minimize costs and improve delivery times. The CNN models [17], [18] also performed well for the
semantic segmentation of land covers irrespective of the
Improving the efficiency of the last mile has the potential
limitation of self-constructed dataset size and collection.
to reduce costs, improve delivery times, and enhance the Recent developments in this field have included U-Net
overall customer experience. With ecommerce sales topologies. Using a ResUNet architecture, [19]
continuing to grow at a rapid pace, businesses are under classified images under 3 groups: fields, buffered borders,
increasing pressure to deliver goods quickly and reliably. and backdrops. ResUNet model has outstanding quality with
This has led to a heightened focus on last mile optimization a mean F1 score of 0.91 and a mean Jaccard coefficient of
as a means of gaining a competitive edge in the marketplace. 0.88. Their model [17], [18], [19] was able to extract
boundaries from areas where urban setting was minimal or
none but struggled to give clear boundaries in mixed

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


topography of agricultural land cover as well as urban
settings. Data Acquisition

The comprehensive literature review of different


researchers’ work on field delineation shows that the CNN Data Preprocessing

architectures like AgriSegNet, ResUnet, and FCNNs have


proven to be successful at predicting the field boundaries. Data Augementation Data normalization
Even so there seems to be a gap between output obtained
from models and the final implementation of field boundaries Image segmentation Building the networks
into smart farming solutions. The current study focuses on
exploiting the full capabilities of CNNs for an effective and
viable way to identify and segment the boundaries of fields Setting up the network
Training the networks
accurately and design a pipeline for extracting the extracted parameters
boundaries from raster format to vector format for integration
into smart farming solutions. Testing the networks

II. STUDY AREA Choosing the best performing


Visualizing the results
The study area chosen is the Netherlands. Fig 1 below, network for predictions
shows the distribution of agricultural croplands.
Getting the predicted field
boundaries

Converting the raster field


boundaries to vector polygon
shapefiles
Overlaying the delineated
fields shapefile on geotiff
images

Fig 2. Workflow diagram

A. Data Sources and Acquisition


The GeoTIFF images from the Sentinel-2 satellite were
used in this study. GeoTIFF images are raster image files that
store the satellite or drone-mapped images and retain their
quality even after they are compressed or manipulated which
is ideal for the process of training the deep learning model.
The Sentinel 2 data for the different provinces in the
Netherlands was collected from the Sentinel Hub EO
browser. The data was ideal for training the Fully connected
The Netherlands has an abundance of precisely separated neural network as it had very little or no cloud cover. The
fields and the data was collected from the following ground truth data was acquired from the Copernicus open-
provinces Friesland, Flevoland, Gelderland, Limburg, source project by the ESA. The figures below represent a
Overijssel, Zeeland, and Zuid-Holland. sample of images and their corresponding ground truths. (Fig
3).
III. MATERIALS AND METHODS
B. Data Preprocessing
Geospatial data was acquired and passed through a
a) Converting Images into Pixel-Wise Arrays: The
series of functions for pre-processing before feeding the data
GeoTIFFs of original images are first converted into pixel-
as input to the neural networks. Various convolutional
wise arrays which are in 3 dimensions, the first and second
networks were used including FCNs and UNets with a
dimension contains the number of x rows and y columns,
different number of layers. The networks are then evaluated
and the third dimension contains the bands of images i.e.,
on evaluation metrics such as precision-recall, F1 score, and
RGB and NIR. Classification images or the ground truth
overall accuracy with appropriate visualization. The best-
tiffs are converted into a 2d array. But the per pixel values
performing network is then used for performing field
of the 2d array of the classification image are 255 and 0 for
delineation. As the network outputs are in raster format
boundary lines and non-boundary respectively.
hence, they are needed to be passed through a raster-to-
vector conversion workflow to attain the respective b) Normalizing the image: Using the Normalized
shapefile. Digital Vegetation Index, the pixel values of the original
image have been normalized.
c) Dividing the images into smaller patches: The
patches are made for more efficient training of the network,
and faster processing. The patch size depends on the
network being trained. E.g., Feeding a smaller patch size,
ideally, 50px * 50 px is preferred whereas for UNet 2 a
patch size of 200px x 200px is biased.
d) Setting up training and testing data set: A random
generator with seed 34 is used to split the data into a training
and testing set of original images and ground truth data such A. Network Parameters
that at least one image patch is included from each area in The network configurations are mentioned in TABLE I.
the testing set.
TABLE I. EG NETWORK PARAMETERS

B. Network Performance Visualization


The patch sizes are adjusted in direct proportion to the
number of layers present in a neural network.

C. Convolutional Neural Networks


FCNs [4] are made up of a deep codec for pixel-wise
classification. The network's encoder, which consists of
many convolutional layers, batch normalization, and
Rectified Linear Units, is ontologically similar to the
convolutional layers of the VGG-16 network. The decoder
would be used to convert the encoder's reduced feature
maps toward the input image's native resolution. The
decoder performs quasi-up sampling utilizing grouping
values obtained in the relevant max-pooling levels of the
sequencer rather than discretization or inverted repetitions.
The resultant up-sampled maps are sparse, therefore robust
component maps are created by convolving them using
Fig 4. Fully Connected network (6 layers) Training loss and training
trainable convolution kernels. This approach decreases the accuracy VS epochs
number of trainable parameters while increasing border
demarcation accuracy. The FCNs are an effective network
for the job of segmentation to these features. The basic
foundation of FCN can be illustrated in Fig 4.

UNet [6] was originally developed and designed for the


task of processing biomedical images but slowly, its
potential for various other purposes was also recognized.
UNet’s architecture is promising for completing the task of
field boundary delineation as it has given very satisfactory
results in object boundary detection. One of the major
advantages of UNet over FCN is its ability to detect features
from a low-resolution image and give good results. UNet
can work with a larger sample size at a time rather than Fig 5. Fully convolutional network (6 layers) – Validation loss and
dividing the samples into smaller sizes while training. Validation accuracy VS epochs

D. Choosing the parameters of the model


The most important parameter to choose while building a
model is the number of layers we want to insert into the
network. For this research, FCN [4] was tested with 6 layers,
and UNet [6] was tested with 3 layers which is claimed to be
the optimal number of layers for the image segmentation [8],
[11] tasks. Each network is given a separate UUID so that
the model history and trained weights can be accessed easily
after the model has finished training. This way, retraining of
the model is avoided and speedy prediction is achieved. The
choice of optimizer was Adam [23] as it combines the
capabilities of SGD and RMSProp. Softmax activation
function was used in the network for its output range of
probabilities in a range for computing logits in a
classification problem [24].
IV. RESULTS
The networks were tested with hyperparameters for which
the model was expected to give the best results. Out of these,
the best networks with the optimal value of hyperparameters
were taken forward for training for a higher number of
epochs.
Network Epochs Batch Patch Size Learning
Name Size Rate
FCN (6 50 8 40000 px 0.1
layers) 800 32 2500 px 0.0015
UNet (3 50 8 40000 px 0.1
layers) 800 8 160000 px 0.0015
It is important to note here that 50 epochs were supposed
to be a dry run for discovering the optimal hyperparameters
for respective networks. Afterwards, the models were trained
on 800 epochs to get the global minima in the loss function.
After analyzing minutely, the graphs of validation loss,
validation accuracy, training loss, and training accuracy.
UNet 3 (Fig 10 and Fig 11) seems to attain the highest
accuracy out of the others. Fully connected models also give
a pretty convincing result at a tradeoff of overfitting the data
after 500 iterations. At some points, the line of accuracy
drops down which indicates that the model is under-fitting or
possibly not able to learn further which is taken care of by
the Adam optimizer which introduces the learning rate and
allows the model to overcome the local minima and increases
the accuracy after some time.

TABLE II. SUMMARY OF VISUAL REPRESENTATIONS


Network Training Training Validation Validation
Name Accuracy Loss Accuracy Loss
FCN 6 0.91 0.21 0.91 0.19

UNet 3 0.95 0.104 0.88 0.54

C. Network Outcome and Accuracy Assessments


1) Representation and Assessment of a Fully Connected Network (6 layers) on Friesland:
a) Visual Representation and Accuracy assessment of Friesland 165:
Fig 8. Friesland 165 Original, GT, Prediction

TABLE IV. FRIESLAND 165 ACCURACY ASSESSMENTS

Friesland 165 accuracy Confusion Matrix Accuracy Metrics


assessments
Actual other Actual Field Boundary Sum Metric Value

Prediction Other 491414 (76.783%) 43467 (6.792%) 534881 Precision 73.04%

Prediction Field Boundary 28340 (4.428%) 76779 (11.997%) 105119 Recall 63.852%

Sum 519754 120246 640000 F1 Score 68.137%

Overall Accuracy 87.78%

2) Representation and Assessment of UNet 3 on Friesland:


a) Visual Representation and Accuracy assessment of Friesland 164:
Fig 9. Friesland 164 Original, GT, Prediction

TABLE V. FRIESLAND 164 ACCURACY ASSESSMENTS

Friesland 164 accuracy Confusion Matrix Accuracy Metrics


assessments
Actual other Actual Field Boundary Sum Metric Value

Prediction Other 494174 (77.215%) 34878 (5.45%) 529052 Precision 72.252%

Prediction Field Boundary 30786 (4.81%) 80162 (12.525%) 110948 Recall 69.682%

Sum 524960 115040 640000 F1 Score 70.944%

Overall Accuracy 92.74%

cover would pose a challenge while extraction but


surprisingly the better-performing UNet 3 network gave
D. Comparison and analysis of networks clear boundaries even in these scenarios. The dynamic
After a detailed analysis of predictions made by our ability of the networks could be attributed to the fact that
networks, it is observed that Unet 3 performs best for both clear predictions were made even for those patches where
sparsely distributed fields and densely distributed fields and the fields were not in a pattern. Also, coastal areas were in
gives the optimal prediction for almost all the test images. A minority in the overall dataset so it was expected for the
Fully convolutional network with 6 layers gives somewhat network to detect the field boundaries along the coastal areas
good predictions for sparely distributed fields. Unet 3 and but on the contrary, it was found that field boundaries were
FCN 6 achieved an approximate overall accuracy of 92% extracted precisely. Our Unet 3 network improves over the
and 88% for Friesland province respectively. For some already existing AgriSegNet [2] by 5% for field delineation.
provinces (Zeeland, Zuid – Holland), higher accuracy was As compared to Fractal UNets in [7] and [8], our models
observed with FCN 6 whereas FCN 6 in comparison to Unet trained faster and as we approached the problem as an
3 gave comparatively low precision for correctly placing the instance of semantic segmentation. Our models gave crisp
boundary pixels in their intended spatial locations. boundaries by achieving 95% accuracy contrast to the
The ability to discern field boundaries from the
boundaries that exist in urban areas is our model’s most
prominent advantage compared to other commensurate models [1], [2], [8], [10], [14], [15], [19] which managed to
models. It was expected that images with 4 - 5% cloud achieve accuracies in a range of from 85 - 91%.
E. Postprocessing V. CONCLUSION
output predictions are in “.tiff” format which can be used
to evaluate the performance of the network but for geospatial
applications, these images need to be georeferenced to their
original imagery layer and the boundaries need to be
vectorized i.e., the output should be in the form of polygon
shapefiles with “.shp” file extension which can be integrated
on field maps or field satellite imagery for precision farming.
This is one of the important aspects of the problem of field
delineation which was not in focus in almost all the research
works that we reviewed. To solve this problem, ArcGIS
software was used to create a pipeline for post-processing
the images. Network outputted tiff images were converted
from raster to vector, georeferenced, and polygonised into
shapefile. Some unnecessary polygons which arise due to
pixel delocalization were dissolved to the most adjacent
polygon. An example can be seen in the fig 10, which shows
the shapefile which is created after the postprocessing of the
predicted tiff image given by the UNet 3 network.
Around 1,00,000 fields can be delineated in 10 seconds [9] R. Yang, Z. U. Ahmed, U. C. Schulthess, M. Kamal, and R. Rai,
“Detecting functional field units from satellite images in smallholder
with precision with our UNet 3 model. One of the key farming systems using a deep learning based Computer Vision
differentiations of this research is the development of the Approach: A case study from Bangladesh,” Remote Sensing
entire workflow right from ingesting the raw GeoTIFF Applications: Society and Environment, vol. 20, p. 100413, 2020.
images to outputting the shapefiles of field boundaries for [10] A. Buslaev, S.S. Seferbekov, V. Iglovikov, A. Shvets, “Fully
implementation into a smart farming platform. For future convolutional network for automatic road extraction from satellite
imagery,” CVPR Workshops, 2018, pp. 207-210.
works, using a super resolution model in conjunction with
[11] P. Iasonas and K. Kokkinos, “Pushing the boundaries of boundary
training the networks with a larger amount of data from detection using deep learning,” Computer Vision and Pattern
different countries with some data augmentation on a higher Recognition, 2015.
number of epochs can be possible to obtain absolute ground [12] P. Arbelaez, M. Maire, C. Fowlkes and J. Malik, "Contour detection
truth-like results. The time-consuming task of delineating and hierarchical image segmentation", PAMI, vol. 33, no. 5, 2011,
pp. 898-916.
fields is challenging and this research did its best with the
[13] R. Kumar, “Automated Field Boundary Detection Using Modern
available time and resources to solve it. Machine Learning Techniques,” Electrical and Computer
Engineering, unpublished.
[14] S. Xie and Z. Tu, “Holistically-nested edge detection,” 2015 IEEE
REFERENCES International Conference on Computer Vision (ICCV), 2015.
[15] G. Bertasius, J. Shi and L. Torresani, "Deepedge: A multiscale
[1] F. Waldner and F. I. Diakogiannis, “Deep learning on edge: bifurcated deep network for top-down contour detection", CVPR,
Extracting field boundaries from satellite images with a convolutional 2015.
neural network,” Remote Sensing of Environment, vol. 245, 2020. [16] L. Meyer, F. Lemarchand, and P. Sidiropoulos, “A deep learning
[2] T. Anand, S. Sinha, M. Mandal, V. Chamola and F. R. Yu, architecture for batch-mode fully automated field boundary
"AgriSegNet: Deep Aerial Semantic Segmentation Framework for detection,” The International Archives of the Photogrammetry,
IoT-Assisted Precision Agriculture," in IEEE Sensors Journal, vol. Remote Sensing and Spatial Information Sciences, vol. XLIII-B3-
21, no. 16, pp. 17581-17590, 15 Aug.15, 2021, I. S. Jacobs and C. P. 2020, pp. 1009–1016, 2020.
Bean, “Fine particles, thin films, and exchange anisotropy,” in [17] S. Lingwal, K. K. Bhatia, and M. Singh, “Semantic segmentation of
Magnetism, vol. III, G. T. Rado, and H. Suhl, Eds. New York: landcover for cropland mapping and area estimation using Machine
Academic, 1963, pp. 271–350. Learning Techniques,” Data Intelligence, pp. 1–18, 2022.
[3] H. Sheng, X. Chen, J. Su, R. Rajagopal and A. Ng, "Effective data [18] Ndikumana, E., Ho Tong Minh, D., Baghdadi, N., Courault, D.,
fusion with generalized vegetation index: Evidence from land cover Hossard, L.: Deep recurrent neural network for agricultural
segmentation in agriculture", Proc. IEEE/CVF Conf. Comput. Vis. classification using multitemporal sar sentinel-1 for camargue,
Pattern Recognit. Workshops (CVPRW), 2020, pp. 267-276. France. Remote Sensing 10(8), 1217 (2018)
[4] J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks [19] A. Taravat, M. P. Wagner, R. Bonifacio, and D. Petit, “Advanced
for semantic segmentation," 2015 IEEE Conference on Computer fully convolutional networks for Agricultural Field Boundary
Vision and Pattern Recognition (CVPR), 2015, pp. 3431-3440. Detection,” Remote Sensing, vol. 13, no. 4, p. 722, 2021.
[5] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature [20] K. Team, Keras. [Online]. Available: https://keras.io/. [Accessed: 25-
hierarchies for accurate object detection and semantic segmentation", Aug-2022].
Computer Vision and Pattern Recognition, 2014. [21] GDAL. [Online]. Available: https://gdal.org/. [Accessed: 2-Aug-
[6] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional 2022].
Networks for Biomedical Image Segmentation,” Lecture Notes in [22] NumPy. [Online]. Available: https:/numpy.org/ [Accessed: 2-Aug-
Computer Science, 2015, pp. 234–241. 2022].
[7] F. Waldner, F. I. Diakogiannis, K. Batchelor, M. Ciccotosto-Camp, [23] TDS. [Online]. Available: https://towardsdatascience.com/adam-
E. Cooper-Williams, C. Herrmann, G. Mata, and A. Toovey, “Detect, optimization-algorithm. [Accessed: 9-Aug-2022].
consolidate, delineate: Scalable mapping of field boundaries using
satellite images,” Remote Sensing, vol. 13, no. 11, p. 2197, 2021. [24] TDS. [Online]. Available: https://towardsdatascience.com/softmax-
activation-function-how-it-actually-works. [Accessed: 15-Aug-2022].
[8] B. Watkins, A. van Niekerk, “A comparison of object-based image
analysis approaches for field boundary delineation using multi-
temporal Sentinel-2 imagery,” Comput. Electron Agric. 2019, pp.
294–302.

You might also like