Parameter Sharing and Typing in Machine Learning
Last Updated :
17 Jun, 2024
We usually apply limitations or penalties to parameters in relation to a fixed region or point. L2 regularisation (or weight decay) penalises model parameters that deviate from a fixed value of zero, for example.
However, we may occasionally require alternative means of expressing our prior knowledge of appropriate model parameter values. We may not know exactly what values the parameters should take, but we do know that there should be some dependencies between the model parameters based on our knowledge of the domain and model architecture.
We frequently want to communicate the dependency that various parameters should be near to one another.
Parameter Typing
Two models are doing the same classification task (with the same set of classes), but their input distributions are somewhat different.
- We have model A has the parameters\boldsymbol{w}^{(A)}
- Another model B has the parameters \boldsymbol{w}^{(B)}
\hat{y}^{(A)}=f\left(\boldsymbol{w}^{(A)}, \boldsymbol{x}\right)
and
\hat{y}^{(B)}=g\left(\boldsymbol{w}^{(B)}, \boldsymbol{x}\right)
are the two models that transfer the input to two different but related outputs.
Assume the tasks are comparable enough (possibly with similar input and output distributions) that the model parameters should be near to each other: \forall i, w_{i}^{(A)} should be close to w_{i}^{(B)} . We can take advantage of this data by regularising it. We can apply a parameter norm penalty of the following form: \Omega\left(\boldsymbol{w}^{(A)}, \boldsymbol{w}^{(B)}\right)=\left\|\boldsymbol{w}^{(A)}-\boldsymbol{w}^{(B)}\right\|_{2}^{2} . We utilised an L2 penalty here, but there are other options.
Parameter Sharing
The parameters of one model, trained as a classifier in a supervised paradigm, were regularised to be close to the parameters of another model, trained in an unsupervised paradigm, using this method (to capture the distribution of the observed input data). Many of the parameters in the classifier model might be linked with similar parameters in the unsupervised model thanks to the designs. While a parameter norm penalty is one technique to require sets of parameters to be equal, constraints are a more prevalent way to regularise parameters to be close to one another. Because we view the numerous models or model components as sharing a unique set of parameters, this form of regularisation is commonly referred to as parameter sharing. The fact that only a subset of the parameters (the unique set) needs to be retained in memory is a significant advantage of parameter sharing over regularising the parameters to be close (through a norm penalty). This can result in a large reduction in the memory footprint of certain models, such as the convolutional neural network.
Convolutional neural networks (CNNs) used in computer vision are by far the most widespread and extensive usage of parameter sharing. Many statistical features of natural images are translation insensitive. A shot of a cat, for example, can be translated one pixel to the right and still be a shot of a cat. By sharing parameters across several picture locations, CNNs take this property into account. Different locations in the input are computed with the same feature (a hidden unit with the same weights). This indicates that whether the cat appears in column i or column i + 1 in the image, we can find it with the same cat detector.
CNN's have been able to reduce the number of unique model parameters and raise network sizes greatly without requiring a comparable increase in training data thanks to parameter sharing. It's still one of the best illustrations of how domain knowledge can be efficiently integrated into the network architecture.
Similar Reads
Top 20 ChatGPT Prompts For Machine Learning Machine learning has made significant strides in recent years, and one remarkable application is ChatGPT, an advanced language model developed by OpenAI. ChatGPT can engage in natural language conversations, making it a versatile tool for various applications. In this article, we will explore the to
10 min read
Fuzzy Logic for Uncertainty Management in Machine Learning Uncertainty in machine learning refers to the inherent unpredictability in model predictions due to factors like data variability and model limitations. Fuzzy logic is a mathematical framework used to handle imprecise and uncertain information by allowing partial truth values between completely true
7 min read
Difference between Statistical Model and Machine Learning In this article, we are going to see the difference between statistical model and machine learningStatistical Model:Â A mathematical process that attempts to describe the population from which a sample came, which allows us to make predictions of future samples from that population.Examples: Hypothes
6 min read
What's Text Annotation and its Types in Machine Learning? Ever been stunned by how your smartphone seems to accurately predict what you have in mind as you type your text responses? Or, have you ever been in awe of how you got your questions answered or money refunded by a customer service associate who was not even a human after all? Well, behind every su
7 min read
50 Machine Learning Terms Explained Machine Learning has become an integral part of modern technology, driving advancements in everything from personalized recommendations to autonomous systems. As the field evolves rapidly, itâs essential to grasp the foundational terms and concepts that underpin machine learning systems. Understandi
8 min read
Difference between Parametric and Non-Parametric Models in Machine Learning When it comes to statistical modeling and machine learning, parametric and non-parametric models represent two fundamental approaches, each with its strengths and suitability depending on the data and the problem at hand. In this article, we are going to explore parametric and non-parametric models
8 min read
What is AI Inference in Machine Learning? Artificial Intelligence (AI) profoundly impacts various industries, revolutionizing how tasks that previously required human intelligence are approached. AI inference, a crucial stage in the lifecycle of AI models, is often discussed in machine learning contexts but can be unclear to some. This arti
7 min read
Introduction to Machine Learning in R Machine learning in R allows data scientists, analysts and statisticians to build predictive models, uncover patterns and gain insights using powerful statistical techniques combined with modern machine learning algorithms. R provides a comprehensive environment with numerous built-in functions and
6 min read
Types of Machine Learning Machine learning is the branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data and improve from previous experience without being explicitly programmed for every task.In simple words, ML teaches the systems to think and understand like h
13 min read
What is Inductive Bias in Machine Learning? In the realm of machine learning, the concept of inductive bias plays a pivotal role in shaping how algorithms learn from data and make predictions. It serves as a guiding principle that helps algorithms generalize from the training data to unseen data, ultimately influencing their performance and d
5 min read