Maciej Satkiewicz’s Post

View profile for Maciej Satkiewicz

Mechanistically Interpretable Vision | President of 314 Foundation

Bridging the gap between Deep Learning and explainable algorithms. Deep neural networks learn fragile "shortcut" features, rendering them difficult to interpret (black box) and vulnerable to adversarial attacks. This paper proposes semantic features as a general architectural solution to this problem. The main idea is to make features locality-sensitive in the adequate semantic topology of the domain, thus introducing a strong regularization. The proof of concept network is lightweight, inherently interpretable and achieves almost human-level adversarial test metrics - with no adversarial training! Can't wait to hear your feedback!

A Conceptual Framework For White Box Neural Networks

A Conceptual Framework For White Box Neural Networks

arxiv.org

To view or add a comment, sign in

Explore topics