0% found this document useful (0 votes)
20 views

Efficient Net

Uploaded by

Arajashekar Raja
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Efficient Net

Uploaded by

Arajashekar Raja
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1

Efficient-Net model
Nagabotu Vimala1 , Anupama Namburu2

Abstract—As a unique neural network design, Efficient Net network breadth, depth, and image resolution, challenging the
is praised for its capacity to blend computational effectiveness traditional techniques of scaling ConvNets. Groundbreaking
with cutting-edge performance across a range of computer outcomes have been obtained since the compound scaling method
vision tasks. This abstract explores the fundamental ideas and was introduced.
novel developments of Efficient Net, illuminating its exceptional The effectiveness of this strategy is demonstrated by the
scalability and efficiency qualities that have transformed deep EfficientNet family of models, which performs noticeably better
learning. To maximise model size and performance, Efficient- than other ConvNet topologies like GPipe and ResNet. This leads
Net uses squeeze-and-excitation blocks, mobile inverted residual to significantly changing the deep learning landscape by achieving
blocks, and compound scaling. It can adjust to various computing better accuracy,lower parameter counts, and faster inference
budgets and applications because to its multi-scale design. By speed. It also emphasises the significance of a principled and
encouraging the development of more effective models and well-balanced scaling method for ConvNets.
encouraging innovation, this architecture has completely changed
the field of deep learning. The most important lessons from
this ground-breaking neural network model are EfficientNet’s A. COMPOUND MODEL SCALING:
adaptability, efficiency, and impact on the deep learning commu-
nity. Convolutional Neural Networks (ConvNets) are commonly We will define the scaling challenge, examine various strategies,
developed at a fixed resource budget, and then scaled up for and suggest our own scaling approach in this part.It makes sense
better accuracy if more resources are available. In this paper, we that deeper network connections yield wider receptive fields for
systematically study model scaling and identify that carefully higher resolution photos, which can aid in capturing similar
balancing network depth, width, and resolution can lead to features with more pixels in larger images.
better performance. Based on this observation, we propose 1) Scaling Dimensions :: The primary challenge with prob-
a new scaling method that uniformly scales all dimensions lem 2 is that the ideal values of d, w, and r are interdependent
of depth/width/resolution using a simple yet highly effective and vary depending on the available resources. Because of this
compound coefficient challenge, most conventional approaches scale ConvNets in one
of these dimensions.
• Depth (d): According to He et al. (2016)[1], Huang et al.
I. I NTRODUCTION (2017)[2], Szegedy et al. (2015)[3], and many other Con-
Scaling up models has been a popular technique in the field of vNets, scaling network depth is the most popular method.
Convolutional Neural Networks (ConvNets) to improve accuracy It makes sense that a deeper ConvNet would be able to
and efficiency. Prominent instances, like ResNet-200 and GPipe, generalise successfully on new tasks and collect richer, more
have illustrated enhanced efficacy by means of augmenting the complicated characteristics. But because of the vanishing
depth and magnitude of the model. However, there has frequently gradient issue, deeper networks are also harder to train
been a lack of a cogent and methodical approach to the ConvNet (Zagoruyko Komodakis, 2016)[4]. The training difficulty
scaling process. Typical approaches have usually concentrated is mitigated by a number of approaches, such as batch
on scaling separately by width, depth, or image resolution, normalisation (Ioffe Szegedy, 2015) [5] and skip connections
which has resulted in less-than-ideal outcomes. This passage (He et al., 2016) [6], however the accuracy gain of very deep
presents a fresh viewpoint, arguing that the harmonic balancing networks decreases: for instance, ResNet-1000 has similar
of all three crucial dimensions—network breadth, depth, and accuracy to ResNet-101 while having many more layers.
image resolution—is the key to successful scaling. By using a • Width (w): For small size models, scaling network width
constant scaling ratio to achieve this equilibrium, the compound is frequently utilised (Howard et al., 2017; Tan et al.,
scaling method—which has shown to be incredibly effective in 2019)[7], [8]. Wider networks are quicker to train and have a
practice—is born. tendency to capture more fine-grained information, as noted
ConvNet scaling is a complex problem, and the suggested in Zagoruyko Komodakis,2016[4]. Higher level features are,
compound scaling approach provides an attractive way to solve however, typically hard to capture inex tremely wide but
it. This method presents an efficient and principled approach shallow networks. Our empirical findings, as seen in Figure
to model scaling by evenly scaling network breadth, depth, and 3 (left), demonstrate that when networks get significantly
resolution with constant coefficients. The findings of this study are broader with increasing w, the accuracy quickly saturates.
really impressive. This method produced a class of models called • Resolution (r): ConvNets may be able to catch more fine-
EfficientNets, which outperforms previous ConvNet architectures grained patterns if their input images have a greater
in terms of accuracy and efficiency. resolution. While early ConvNets used 224x224, more recent
They routinely outperform other models on a variety of ConvNets often employ 299x299 (Szegedy et al., 2016) [9]or
datasets, demonstrating that their excellent performance is not 331x331 (Zoph et al., 2018)[10] for increased accuracy.
restricted to ImageNet. Moreover, these models demonstrate As of late, GPipe(Huang et al., 2018)[?] obtains 480x480
the capacity to get higher precision with fewer parameters, resolution and state-of-the-art ImageNet accuracy. Greater
which in turn accelerates inference. To sum up, this research resolutinos, like 600x600, are also frequently utilised in
study advocates for a well-balanced methodology that considers ConvNets for object detection[11]. As seen in Figure 3
(right), the scaling network resolutions produced, which
1 Research scholar, School of Computer Science Engineering, VIT-AP were in fact higher Resolutions increase accuracy, although
University, Amaravathi, India. Email: [email protected] at very high resolutions (r = 1.0 indicates resolution), the
2 Faculty, School of Computer Science Engineering, VIT-AP University, accuracy benefit decreases. Resolution 560x560 is indicated
Amaravathi, India. Email: [email protected] by 224x224 and r = 2.5
2

B. COMPOUND SCALING: Sutskever, I., Hinton, G. E. (2012). ImageNet classification


To catch more fine-grained patterns, we need correspondingly with deep convolutional neural networks. In Advances in neural
increase network breadth when resolution is higher. A compound information processing systems (NeurIPS). Simonyan, K., Zis-
coefficient - to uniformly scales network width, depth, and serman, A. (2014). Very deep convolutional networks for large-
resolution in a principled way: scale image recognition. arXiv preprint arXiv:1409.1556. He,
K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning
depth: d = αϕ for image recognition. In Proceedings of the IEEE conference
on computer vision and pattern recognition (CVPR). Tan, M.,
width: w = β ϕ Le, Q. V. (2019). EfficientNet: Rethinking model scaling for
resolution: r = γ ϕ (1) convolutional neural networks. In International Conference on
2 2
s.t. α · β · γ ≈ 2 Machine Learning (ICML).)
α ≥ 1, β ≥ 1, γ ≥ 1
A. EfficientNet Architecture:
where α, β, and γ are constants that a modest grid search
can find. The user-specified coefficient determines the intuitively It is also essential to have a strong baseline network since
large number of additional resources available for model scaling, layer operators F∧ i in the baseline network are not altered
while the parameters α, β, and γ indicate the allocation of by model scaling. Existing ConvNets will be used to assess
these additional resources to the network breadth, depth, and our scaling strategy, but we have also created a new mobile-
resolution, respectively. Interestingly, a standard convolution size baseline, called EfficientNet, to more clearly illustrate its
operation’s FLOPS is proportional to d, w², and r². This study efficacy. We build our baseline network using a multi-objective
∧ neural architecture search that maximises accuracy and FLOPS,
constrains α × β 2 × γ 2 Φ so that the total FLOPS will roughly
drawing inspiration from Tan et al. (2019). In particular, we
grow by 2∧ Φ for any new Φ.
employ the same search space as Tan et al., 2019 and set the
optimisation goal to ACC(m) × [F LOP S(m)/T ]w, where T is
II. L ITERATURE R EVIEW: the target FLOPS, w = −0.07 is a hyperparameter that controls
Deep learning and medical image analysis have come together the trade-off between FLOPS and accuracy, and ACC(m) and
to create new opportunities for improving healthcare outcomes F LOP S(m) represent the accuracy and FLOPS of model m. In
and automating diagnostic processes. One particularly interesting contrast to Tan et al. (2019) and Cai et al. (2019), we prioritise
approach to enhancing paediatric health evaluations is the use FLOPS over latency in this case because no particular hardware
of deep learning algorithms to predict head circumference from device is our goal. We call the network that emerges from
medical pictures. This review of the literature synthesises the our search EfficientNet-BO. Given that we employ the identical
body of research and provides insights into the development of search space as Tan et al. (2019), the architecture is comparable
deep learning algorithms, the importance of head circumference to that of Mnas Net, with the exception that our EfficientNet-BO
measurements, and the difficulties that this field presents. In is marginally larger because of the increased FLOPS objective
paediatric healthcare, head circumference is an important an- (400M). The EfficientNet-BO architecture is displayed in Table
thropometric measure that provides information about general 1. Mobile inverted bottleneck MBConv (Sandler et al., 2018; Tan
growth and neurological development. Traditional measuring et al., 2019) serves as its primary building element, to which
techniques are prone to time limits and inter-observer variability squeeze-and-excitation optimisation (Hu et al., 2018) is added.
since they frequently depend on manual operations. These We use two steps in our compound scaling method to scale up
difficulties may be addressed by automating head circumference EfficientNet-BO starting from the baseline.
calculation using deep learning models, which would yield reliable • STEP 1: Using Equations 2 and 3, we first fix ϕ = 1,

and consistent measurements that are essential for tracking assuming twice as many resources are available, and then
developmental milestones and spotting any health issues. we perform a brief grid search of α, β, and γ. With the
Medical image analysis has seen a rise in the use of transfer restriction of α · β2 · A2 ≈ 2, we specifically discover that
learning, a technique that involves tailoring a model that has the optimal values for EfficientNet-B0 are α = 1.2, β = 1.1,
been pre-trained on a sizable dataset for a particular pur- and γ = 1.15.
pose. By utilising the insights gained from general datasets, • STEP 2: Next, we use Equation 3 to scale up the baseline

models may be refined to identify pertinent characteristics in network with varying ϕ, fixing α, β, and γ as constants, to
medical picture extraction. The prevalent drawback of having generate EfficientNetB1 to B7.
little annotated medical data is addressed by this method.
The literature review examines the use of transfer learning to III. M ATERIAL AND M ETHODS
head circumference estimation and examines the consequences
for model performance and generalisation on certain medical A. Dataset Description
imaging datasets. Even with deep learning’s potential in head The Efficient-Net model’s effectiveness is validated on an HC18
circumference calculation, there are still a number of obstacles database in this study publication. This database contains 1354
to overcome. Important factors to take into account include the ultrasound images taken from 551 pregnant women between
heterogeneity of medical imaging datasets, the interpretability May 2014 and May 2015. Sonographers use the Voluson 730
of model conclusions, and the requirement for strong validation and Voluson E8 ultrasound machines to record the ultrasound
techniques. These difficulties highlight how crucial it is to handle images in this scenario during various pregnancy semesters. In
peculiarities unique to a certain domain and guarantee the moral this work, approximately 355 ultrasound photos are used as a
and appropriate application of deep learning models in paediatric testing set, while the remaining 999 ultrasound images are used
medicine. as a training set.
The literature review concludes by highlighting the importance
of head circumference estimate in paediatric healthcare and
the evolutionary trajectory of deep learning in medical image B. Preprocessing
analysis. The integration of transfer learning methodologies and The CSV files that have data on head circumference, pixel
the identification of related obstacles facilitate our study, which sizes, and picture filenames are used to load the dataset.
seeks to utilise the EfficientNet model to tackle certain problems Train test split is used to divide data into training, validation,
noted in the literature already in existence. (Krizhevsky, A., and test sets.
3

Figure 1: Architecture of Efficient-Net model.

We develop a custom dataset class (HC 18) to import and do F. Uses:


preprocessing on the dataset. PyTorch’s transformations module EfficientNet is applicable to a variety of image recognition
is used to resize, rotate, apply Gaussian blur, and normalise tasks, including:
images.
• Object recognition
• Segmenting images
IV. P ROPOSED M ETHODOLOGY • Image description
A family of convolutional neural networks (CNNs) called Effi- • Visual response to a query
cientNet was created to more efficiently attain cutting-edge pic-
ture categorization accuracy. By employing a compound scaling
technique to scale the network’s depth, breadth, and resolution, it G. Advantages:
strikes a compromise between efficiency and accuracy. With this
Extremely accurate:
approach, the model space can be explored effectively and a set
On a number of benchmarks, EfficientNet attains cutting-edge
of pre-trained models with a variety of accuracy and efficiency
picture classification accuracy
trade-offs may be obtained.
Efficient:
Compared to other widely used models, EfficientNet models
A. Crucial attributes of EfficientNet: exhibit a notable increase in efficiency, rendering them perfect
Compound scaling applies a set scaling coefficient to simulta- for implementation in contexts with limited resources.
neously scale the network’s depth, breadth, and resolution. By Scalable:
doing this, the network’s general architecture and performance Users may select the most appropriate model from a range of
are maintained and all network components are scaled propor- pre-trained models that EfficientNet offers, each with a varied
tionately. balance between accuracy and efficiency.
Restrictions:
Cost of computation: Even though EfficientNet uses less com-
B. Inverted bottleneck on mobile devices MBConv block: putation than other models, training and inference still use a
Effective construction block that preserves accuracy at low substantial amount of computing power.
computing costs by applying depthwise separable convolutions. Memory footprint:
The larger memory footprint of larger EfficientNet models may
prevent them from being deployed on some systems. All things
C. Block with squeeze-and-excitation (SE):
considered, EfficientNet is a strong and effective CNN architec-
This block gains the ability to suppress less helpful characteris- ture that produces cutting-edge picture classification accuracy.
tics and selectively highlight instructive ones. Progressive learning
is an effective training method that builds the network size
gradually from smaller models. Better convergence and quicker V. D ISCUSSION :
training are made possible by this. Here’s a diagram illustrating
The creation of a dataset including medical photographs
the EfficientNet architecture:
of newborn and kid heads is the first step in the project.
A millimeter-accurate ground-truth head circumference (HC)
D. EfficientNet variants: measurement is linked to every image in the dataset. To guar-
There are several types of EfficientNet, and each one offers a antee that it has the photos and matching HC values needed
unique compromise between efficiency and accuracy. The models for training, validation, and testing, the dataset is carefully
go by the names B0 through B7, with B0 being the smallest and selected.Calculating HC in Pixels: The code divides the HC values
most accurate and B7 being the biggest and least efficient. by the pixel size to determine the HC in pixels. By taking this
step, it is guaranteed that HC values, even in the face of pixel
size differences, are represented uniformly throughout pictures.
E. Achievement: Normalisation: To guarantee that the range of HC values
On several benchmarks, EfficientNet achieves state-of-the-art is constant, the computed HC values in pixels are further
picture classification accuracy while being notably more efficient normalised. Normalisation is done for values in the range of
than other well-known models. 0 and 1.
4

Splitting the Dataset: Three subsets are created from the decision is in line with the objective of using HC measurement in
dataset: actual clinical situations,where there can be resource limitations.
• Training Set: The deep learning model is trained using this Scalability:
subset. The code emphasises EfficientNet’s flexibility to be scaled to fit a
• Validation Set: To keep an eye on the model’s performance variety of computing needs and applications. This characteristic
during training and avoid overfitting, a different validation allows the model to be flexible and perhaps implementable in
set is employed. many healthcare situations. An important part of the experiment
• Test Set: The test set is used to assess how well the model is assessing the model’s performance on a specific test dataset.
generalises to new data and how accurate it is. The findings, which are shown as the MSE and standard
The ’efficientnet-b7’ version of the EfficientNet architecture is deviation of the HC forecasts, offer important insights:
used in the experiment. The selection of EfficientNet is based MSE Loss:
on its well-known scalability and efficiency. It is renowned for The Mean Square Error measures how accurate the predictions
finding the ideal balance between model size and functionality, made by HC are. The anticipated values match the ground-truth
which qualifies it for a number of real-world uses. The chosen measurements more closely the smaller the mean square error
EfficientNet model has been pre-trained on a sizable and varied (MSE). An important statistic for determining the correctness of
dataset, which has allowed it to pick up useful patterns and the model is the reported MSE.
characteristics in images. However, fine-tuning is done on the Standard Deviation:
HC dataset in order to tailor the model to the particular goal of A measure of accuracy is the millimeter-based standard deviation
HC estimation. The model’s weights are updated throughout this of the expected HC values. A lower standard deviation denotes
fine-tuning procedure to better suit the HC prediction objective. more accurate and consistent handwriting. The code discussion
During the model’s training phase, performance optimisation is recognises the wider ramifications of automating HC measure-
a must. The actions listed below are completed. ment with EfficientNet and deep learning:
• Loss Function: The difference between the ground-truth HC
Effect on Paediatric Healthcare: By offering precise and
measurements and the anticipated HC values is quantified effective instruments for tracking newborns’ and kids’ growth
using the mean square error (MSE) loss function. and development, automated HC measurement might have a good
• Optimizer: During training, the model’s parameters are
effect on paediatric healthcare. This helps to provide high-quality
modified using the Adam optimizer. treatment and early diagnosis of developmental problems.
• Early Stopping: To keep an eye on the model’s performance
Efficiency in Clinical Settings: The code may be integrated into
on the validation set, an early stopping mechanism is put clinical settings due to its emphasis on efficiency. HC evaluation
in place. If overfitting of the model is detected, training is and reporting can be streamlined by automatically integrating
terminated. automated HC measurement into telemedicine and electronic
health record (EHR) systems.
Using a specific test dataset, the trained model’s performance is
assessed. There are two main assessment measures used:
Mean Square Error (MSE): This measure reflects the degree VI. E XPERIMENTAL R ESULTS
to which the ground-truth values and the anticipated HC values
agree. Better model accuracy is indicated by a lower MSE.
Standard Deviation of Expected HC Values: The model’s
predicted standard deviation is measured in millimetres.Greater
accuracy in HC estimate is shown by a smaller standard
deviation. The outcomes of these tests offer insightful information
on the precision and accuracy of the model. Plots are used in the
algorithm to display the training and validation loss histories.
This makes it possible to evaluate the model’s performance
visually as it changes over training.
Model Saving: The optimizer state and other pertinent data,
including the trained model, are saved to a checkpoint file. This
file can be utilised for additional training or future forecasting.

A. Automated HC Measurement:
The code offers a convincing way to automate the critical pro-
cess of measuring head circumference from medical photographs,
which is important for paediatric healthcare. This automation
provides a number of noteworthy benefits:
Decreased Human Error: Figure 2: Results of Efficient-Net model
The code reduces the possibility of human error that might arise
during manual measurements by automating the HC measuring Training Loss Reduction:
process. As a result, HC evaluations are more dependable and From the beginning value of 0.1986 in the first epoch to 0.0502
consistent. in the tenth epoch, the training loss reduced steadily. If the
Healthcare Efficiency: training loss is consistently reduced, it means that the model
Healthcare personnel may work much more efficiently when they was able to identify patterns in the training data and match the
are automated. Because it takes less time and effort to do HC target variable (head circumference) more precisely.
measurements, medical staff members may devote more of their Validation Loss:
time to other crucial patient care tasks. A declining trend was also seen in the validation loss, which
Choice of EfficientNet: peaked in the tenth epoch at 0.0458. One important statistic to
The code’s deliberate use of the EfficientNet architecture high- consider is the validation loss, which shows how effectively the
lights its emphasis on scalability and efficiency. model generalises to new data. The model appears to havedone
Efficiency: well on fresh samples and did not overfit the training set, based
The computational efficiency of EfficientNet is well known. This on the decreasing validation loss.
5

on average, the predicted head circumferences deviate by 0.05


units (in the scale of your target variable) from the actual values.
Interpretation:
Evaluate the obtained MSE and MAE values in the context of
your specific problem. Compare these values to any baseline
models or traditional methods you may have used for head cir-
cumference estimation. Consider the scale of your target variable
(head circumference) to understand the practical significance of
the errors.

VII. C ONCLUSION
Here, we have presented an automated method that might
transform paediatric healthcare: deep learning-based head cir-
cumference (HC) prediction from medical pictures using the
EfficientNet architecture. The relevance and ramifications of
this novel paradigm have been clarified by the implementation,
outcomes, and debates.The following are the main lessons to be
Figure 3: Results of Efficient-Net model using Different pa- learned from the code created in this experiment:
Automation of HC Measurement: The code automates the
rameters. time-consuming and manually operated HC measurement pro-
cess. By doing this, the possibility of measurement mistakes and
human error is eliminated, resulting in HC evaluations that are
more reliable and consistent.
Efficiency and Scalability: The EfficientNet design was selected
in order to prioritise both of these qualities. Real-world clinical
situations can benefit greatly from EfficientNet’s computing
efficiency and flexibility to accommodate varying computational
budgets.
Robust Training and Evaluation: Strict training and assess-
ment protocols, such as early termination, guarantee the model’s
efficacy and guard against overfitting. A careful approach to
model optimisation is shown by the employment of the Adam
optimizer and the Mean Square Error (MSE) loss function.
Quantitative Results: The accuracy and precision of the model
may be quantified by looking at the Mean Square Error (MSE)
and standard deviation of HC predictions. The performance of
the model may be evaluated in large part thanks to these results.
Impact on Paediatric Healthcare: The area of paediatric
healthcare as a whole will be affected by the consequences of
Figure 4: Training and validation losses. this work. By facilitating the early diagnosis of developmental
problems and optimising healthcare procedures, automating HC
assessment holds the potential to enhance the quality of care
given to newborns and children.
Best Epoch: Efficiency in Clinical Settings: It appears that the model may
With a validation loss of 0.0458, the model performed best on be easily incorporated into clinical settings given the code’s
the validation set in the tenth epoch. It is thought that this era emphasis on adaptation and efficiency. The potential to improve
performs the best in terms of minimising validation loss. healthcare practises exists with the incorporation of automated
Training Time: HC measurement into telemedicine platforms and electronic
With an average of 579.64 seconds per epoch, the training health records(EHR). In summary, deep learning applications
procedure took 5796.37 seconds in total. Determining the and medical picture analysis have advanced significantly thanks
computational efficiency requires an understanding of the to the code and experiment that have been described. The ability
training time. to use EfficientNet to automate HC measurement from medical
Implications: photos demonstrates the dedication to accuracy, productivity, and
It appears that the model was successful in picking up on the the health of young patients. The findings and ideas presented
underlying trends in the training data because there was a here demonstrate the potential of deep learning in healthcare
consistent decrease in both training and validation losses. The applications and open the door to better paediatric healthcare
eleventh epoch produced the best-performing model on the practises. This study encourages more innovation at the nexus
validation set. of technology and healthcare in addition to making a significant
contribution to the area
Mean Squared Error (MSE):
A lower MSE indicates better performance. It represents the VIII. F UNDING
average squared difference between predicted and actual head None
circumferences. For example, if the MSE is 0.1, it means that, on
average, the predicted head circumferences deviate by 0.1 units IX. I NFORMED CONSENT
(in the scale of your target variable) from the actual values.
Mean Absolute Error (MAE): None
Similar to MSE, a lower MAE is desirable. It represents the
average absolute difference between predicted and actual head X. D ECLARATION OF C ONFLICTS OF I NTEREST
circumferences. For example, if the MAE is 0.05, it means that, The authors does not have any conflicts of Interests.
6

ACKNOWLEDGEMENTS
The content of this publication is solely the responsibility of
the authors.

R EFERENCES
[1] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in
deep residual networks,” in Computer Vision–ECCV 2016: 14th
European Conference, Amsterdam, The Netherlands, October 11–14,
2016, Proceedings, Part IV 14. Springer, 2016, pp. 630–645.
[2] G. Huang, D. Chen, T. Li, F. Wu, L. Van Der Maaten, and K. Q.
Weinberger, “Multi-scale dense convolutional networks for efficient
prediction,” arXiv preprint arXiv:1703.09844, vol. 2, no. 2, 2017.
[3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
convolutions,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2015, pp. 1–9.
[4] S. Zagoruyko and N. Komodakis, “Paying more attention to atten-
tion: Improving the performance of convolutional neural networks
via attention transfer,” arXiv preprint arXiv:1612.03928, 2016.
[5] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
network training by reducing internal covariate shift,” in Interna-
tional conference on machine learning. pmlr, 2015, pp. 448–456.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
for image recognition,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2016, pp. 770–778.
[7] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient
convolutional neural networks for mobile vision applications,” arXiv
preprint arXiv:1704.04861, 2017.
[8] D. Q. Huang, A. G. Singal, Y. Kono, D. J. Tan, H. B. El-Serag, and
R. Loomba, “Changing global epidemiology of liver cancer from
2010 to 2019: Nash is the fastest growing cause of liver cancer,”
Cell metabolism, vol. 34, no. 7, pp. 969–977, 2022.
[9] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
“Rethinking the inception architecture for computer vision,” in
Proceedings of the IEEE conference on computer vision and pattern
recognition, 2016, pp. 2818–2826.
[10] H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean, “Efficient
neural architecture search via parameters sharing,” in International
conference on machine learning. PMLR, 2018, pp. 4095–4104.
[11] L.-C. Chen, M. Collins, Y. Zhu, G. Papandreou, B. Zoph, F. Schroff,
H. Adam, and J. Shlens, “Searching for efficient multi-scale archi-
tectures for dense image prediction,” Advances in neural information
processing systems, vol. 31, 2018.

You might also like