Ajay Taneja’s Post

View profile for Ajay Taneja

Senior Data Engineer | Generative AI Engineer at Jaguar Land Rover | Ex - Rolls-Royce | Data Engineering, Data Science, Finite Element Methods Development, Stress Analysis, Fatigue and Fracture Mechanics

Heterogeneous Graphs and Relational Graph Convolutional Networks (RGCNs): How can we prevent overfitting? In my post of last week [https://lnkd.in/d5cZn5ur], I spoke about Heterogeneous Graphs - examples of Heterogeneous Graphs being Knowledge Graphs which might have several node types or relationship types. I had mentioned that training such complex networks can become computationally intensive because each relationship type has to be associated with different weight/parameter thus increasing the number of  overall parameters to be learnt for a RGCN model and thus resulting in overfitting of the model. I had then highlighted about "Block Diagonal Matrices" And "Basis Learning" to reduce the number of parameters of a RGCN model. Let us try and put some more insight into "Block Diagonal and Basis Learning" and understand what these actually mean! Block Diagonal Matrices: The key idea with "Block Diagonal Matrices" is to make the Transformation matrices corresponding Wr as sparse as possible- that is: to enforce a block diagonal structure as shown in the Figure 1 below so that the non-zero elements lie only on the specific blocks of the transformation matrix "Wr". This reduces the number of non-zero elements which means that during computation you will have to only estimate the non-zero elements. This reduces the training time considerably What do you lose by using Block Diagonal Matrices? Well, we do lose some information by using Block Diagonal Matrices - the embedding dimension far from one another will not be able to interact with each other It’s only the embedding dimension that are along the non-zero entities that will be able to interact with each other that is: only the neurons along the non-zero entities will be able to interact with each other Basis Learning: The second idea to make the RCNs more scalable is to share the weights amongst the relationship types. That is: we do not want the relationships to be independent of each other, but we want the relations to share the information based on some magnitude of weighting. This can be achieved by elegantly representing the Transformation matrices Wr as a linear combination of "Basis" matrices. These "Basis" matrices are also termed "Dictionaries" In this approach, the weight matrix Wr will be represented as a linear combination i.e. a weighted sum of Dictionary matrices Vb as highlighted in the Equation below In the equation, arb is the "importance" weight corresponding to the relation for the matrix "b". Thus, every relation will have to learn its importance weight for each of the basis matrices. This significantly reduces the number of parameters - as the basis matrices may be 10 or a larger number whereas number of relation types may be thousands and arb are scalar numbers will have to be learnt #graphs #machinelearning #deeplearning My Technical Blogs Library 1) https://lnkd.in/ejyWacXn 2) https://lnkd.in/bfrkRjq Follow Ajay Taneja for insights on Engineering

  • diagram

To view or add a comment, sign in

Explore topics