0% found this document useful (0 votes)
15 views

Semi-Supervised Learning

Uploaded by

asim zaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Semi-Supervised Learning

Uploaded by

asim zaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Advanced Topics;

Semi-Supervised Learning YouTube Playlist

Maziar Raissi

Assistant Professor

Department of Applied Mathematics

University of Colorado Boulder

[email protected]

<latexit sha1_base64="USyVVUP1NTUKfRIMnttD99743OA=">AAACL3icbVC7TgMxEPTxDO8AJY1FhESTcBckoIygoQwSgUghQnu+vcTCZ59sH1IU5UP4DL6AFr4A0SAaCv4CX3IFEKYaze5odidMBTfW99+8mdm5+YXF0tLyyura+kZ5c+vKqEwzbDEllG6HYFBwiS3LrcB2qhGSUOB1eHeWz6/vURuu5KUdpNhNoCd5zBlYJ92WD6tVqtFZDEo71qhA0JLLHgUZHShNBYQoaKpVCr3CVfFr/hh0mgQFqZACzdvy502kWJa4CCbAmE7gp7Y7BG05EzhavskMpsDuoIcdRyUkaLrD8XMjuueUiMbukFhJS8fqT8cQEmMGSeg2E7B983eWi//NOpmNT7pDLtPMomSToDgT1CqaN0UjrpFZMXAEmObuVsr6oIFZ1+evlMjkp41cL8HfFqbJVb0WHNXqF/VK47RoqER2yC7ZJwE5Jg1yTpqkRRh5IE/kmbx4j96r9+59TFZnvMKzTX7B+/oGw0iqPw==</latexit>

– representation learning and/or label propagation


Virtual Adversarial Training: A Regularization Method
for Supervised and Semi-Supervised Learning YouTube Video

! full objective function


<latexit sha1_base64="gzwztjgv+Clk1hWNdfqpT8wecAQ=">AAACJ3icbVDLSgNBEJz1bXxFPXoZDIJ4CLsq6lH04lHBqJCEMDvpTcbMziwzvWpY8g/+hj/gVf/Am+jRi9/hbMzBJDY0FNXVdFeFiRQWff/Tm5icmp6ZnZsvLCwuLa8UV9eurE4NhwrXUpubkFmQQkEFBUq4SQywOJRwHXZO8/n1HRgrtLrEbgL1mLWUiARn6KhGcadmRKuNzBh9T2sID5hFqZRUh7fAUdwBjVLFc22vUSz5Zb9fdBwEA1AigzpvFL9rTc3TGBRyyaytBn6C9YwZFFxCr1BLLSSMd1gLqg4qFoOtZ31PPbrlmCaNtHGtkPbZvxsZi63txqFTxgzbdnSWk//NqilGR/VMqCRFUPz3kLNMUdM8INoUxjmXXQcYN8L9SnmbGcbRxTh0pWnz13oFF0wwGsM4uNotBwflvYv90vHJIKI5skE2yTYJyCE5JmfknFQIJ4/kmbyQV+/Je/PevY9f6YQ32FknQ+V9/QAY4KfG</latexit>

! labeled dataset
<latexit sha1_base64="XjVonP3p0I5E2o3Vunxr8nNHdOU=">AAACH3icbVBJSgNBFK12Nk6tLt0UBsFV6FZRl0E3LhVMIiRN+F39kxRWD1T9VkOTvdfwAm71Bu7EbS7gOawMC6cHBY/3/lQvzJQ05HlDZ2Z2bn5hcWm5tLK6tr7hbm7VTZprgTWRqlTfhGBQyQRrJEnhTaYR4lBhI7w9H/mNO9RGpsk19TMMYugmsiMFkJXa7m5Ly26PQOv0nrcIH6hQEKLCiEdAdjAN2m7Zq3hj8L/En5Iym+Ky7X62olTkMSYkFBjT9L2MggI0SaFwUGrlBjMQt9DFpqUJxGiCYvyXAd+zSsQ7qbYvIT5Wv3cUEBvTj0NbGQP1zG9vJP7nNXPqnAaFTLKcMBGTRZ1ccUr5KBgeSY2CVN8SEFraW7nogQZBNr4fWyIzOm1QssH4v2P4S+oHFf+4cnh1VK6eTSNaYjtsl+0zn52wKrtgl6zGBHtkz+yFvTpPzpvz7nxMSmecac82+wFn+AU+8KQx</latexit>

negative log likelihood for labeled data


<latexit sha1_base64="6uR6GyR/44f+kwGxHtCdst6b6Sc=">AAACJXicbVDLSgMxFM3Ud31VXboJFsFVmamgLotuXCrYVqhDuZO504ZmkiHJFErpL/gb/oBb/QN3Irhy53eYtrPwdVaHc+7NPTlRJrixvv/ulRYWl5ZXVtfK6xubW9uVnd2WUblm2GRKKH0bgUHBJTYttwJvM42QRgLb0eBi6reHqA1X8saOMgxT6EmecAbWSd3KkcSeo0OkQvWo4AP3Ul+pmCZKUwERCoxpDBa6lapf82egf0lQkCopcNWtfN7FiuUpSssEGNMJ/MyGY9CWM4GT8l1uMAM2gB52HJWQognHsx9N6KFT5hkSJS2dqd83xpAaM0ojN5mC7Zvf3lT8z+vkNjkLx1xmuUXJ5oeSXFCr6LQeGnONzIqRI8A0d1kp64MGZl2JP67EZhptUnbFBL9r+Eta9VpwUju+rlcb50VFq2SfHJAjEpBT0iCX5Io0CSP35JE8kWfvwXvxXr23+WjJK3b2yA94H1+HEKXB</latexit>

! unlabeled dataset
<latexit sha1_base64="WX3/j1Ls4TL2QAWMVZvgfFmc5mE=">AAACIXicbVC7SgNBFJ31bXxFLW0Go2AVdlXUMmhjqWCikIRwd/YmGZydXWbuqmHJD/gb/oCt/oGd2Im93+HkUfg6MHA4577mhKmSlnz/3ZuYnJqemZ2bLywsLi2vFFfXajbJjMCqSFRirkKwqKTGKklSeJUahDhUeBlenwz8yxs0Vib6gnopNmPoaNmWAshJreJWw8hOl8CY5JY3CO8oz7SCEBVGPAJyo6nfKpb8sj8E/0uCMSmxMc5axc9GlIgsRk1CgbX1wE+pmYMhKRT2C43MYgriGjpYd1RDjLaZD3/T59tOiXg7Me5p4kP1e0cOsbW9OHSVMVDX/vYG4n9ePaP2UTOXOs0ItRgtameKU8IH0fBIGhSkeo6AMNLdykUXDAhyAf7YEtnBaf2CCyb4HcNfUtstBwflvfP9UuV4HNEc22CbbIcF7JBV2Ck7Y1Um2D17ZE/s2XvwXrxX721UOuGNe9bZD3gfXxDxpSg=</latexit>

Train p(y|x, ✓) using Dl and Dul .


<latexit sha1_base64="HodMeROlhay1ISPStSkT9GLwkeQ=">AAACQHicbVDLSsNAFJ34rPVVdelmsBEUpCQK6lKsC5cKVoW2lJvJtB2cTMLMjVhiPsjf8Afc6g+IO3HrykntwqoHBg7n3Ms9c4JECoOe9+JMTE5Nz8yW5srzC4tLy5WV1UsTp5rxBotlrK8DMFwKxRsoUPLrRHOIAsmvgpt64V/dcm1ErC5wkPB2BD0luoIBWqlTqV9oEIq6ydbg/m6nhX2OsO3S1AjVo24rAuwzkNlJ3pEuBRWOa1kqc7fWqVS9mjcE/Uv8EamSEc46lddWGLM04gqZBGOavpdgOwONgkmel1up4QmwG+jxpqUKIm7a2fCzOd20Ski7sbZPIR2qPzcyiIwZRIGdLJKa314h/uc1U+wetjOhkhS5Yt+HuqmkGNOiORoKzRnKgSXAtLBZKeuDBoa237EroSmi5WVbjP+7hr/kcrfm79f2znerR8ejikpknWyQLeKTA3JETskZaRBGHsgTeSYvzqPz5rw7H9+jE85oZ42Mwfn8Aiqyr3w=</latexit>

Adversarial Training
<latexit sha1_base64="nv2tOMZFUebmxB3CejJlp3054qE=">AAACHHicbVDLSsNAFJ3UV62vqksXDhbBVUkqqMuqG5cV2lpoQ7mZTNqhk0mYmQgldOlv+ANu9Q/ciVvBH/A7nLRZ2NYDA4dz7p17OF7MmdK2/W0VVlbX1jeKm6Wt7Z3dvfL+QVtFiSS0RSIeyY4HinImaEszzWknlhRCj9MHb3Sb+Q+PVCoWiaYex9QNYSBYwAhoI/XLx73pH6mk/uTazyZBMuC4KYEJJgb9csWu2lPgZeLkpIJyNPrln54fkSSkQhMOSnUdO9ZuClIzwumk1EsUjYGMYEC7hgoIqXLTaYgJPjWKj4NImic0nqp/N1IIlRqHnpkMQQ/VopeJ/3ndRAdXbspEnGgqyOxQkHCsI5y1gn0mKdF8bAgQyUxWTIYggWjTyNwVX2XRJiVTjLNYwzJp16rORfX8vlap3+QVFdEROkFnyEGXqI7uUAO1EEFP6AW9ojfr2Xq3PqzP2WjByncO0Rysr1+9gKLc</latexit>

Reduction LDS would make the model smooth at each data point.
<latexit sha1_base64="Qkkr92Jle5Ax8lZdZomMBSDeFmw=">AAACO3icbZA9TxtBEIb3ICTEkMQJJc0oFhKVdUckSBWhJAUFBZAYkIxlze2NuZX347Q7B7Is/5v8Df4AbdJQI1GgtPS5My7Cxyut9OidGc3smxZaBY7jq2hu/sXCy1eLrxtLy2/evmu+/3AYXOkldaTTzh+nGEgrSx1WrOm48IQm1XSUDr/V9aMz8kE5+5NHBfUMnlo1UBK5svrNLweUlbJm2P3+A85dqTMwOCTgnMC4jDQE4xzngAyEMocMGaFwynK70W+24nY8FTyFZAYtMdNev3lzkjlZGrIsNYbQTeKCe2P0rKSmSeOkDFSgHOIpdSu0aCj0xtN/TmCtcjIYOF89yzB1/58YowlhZNKq0yDn4XGtNp+rdUsefO6NlS1KJivvFw1KDeygDg0y5UmyHlWA0qvqVpA5epRcRftgSxbq0yZ1MMnjGJ7C4UY72Wx/2t9obX+dRbQoVsVHsS4SsSW2xY7YEx0hxS9xKX6LP9FFdB3dRn/vW+ei2cyKeKDo7h87nq2B</latexit>

VAT on semi-supervised learning can be given a similar interpretation as label propagation.


<latexit sha1_base64="mYvT/27SKLZ8v5G3AuYig8K9fG4=">AAACYHicbVA9bxNBEF1f+DDmIw500IywkGiw7oIElIE0lIkUO5Fsy5pbjy+r7Jd25yysk38gP4GWIm1a6HJ3cUESnrTS03szO08v91pFTtNfnWTnwcNHj7tPek+fPX+x2997OY6uDJJG0mkXznKMpJWlESvWdOYDock1neYXh41/uqIQlbMnvPY0M1hYtVQSuZbmfTll+sHV+OsJOAuRjPoQS09hpSItQBMGq2wBEi3kBIVakQWEqIzSGEBZplDf4/Y3wAgac9Lgg/NYtOJw05v3B+kwbQH3SbYlA7HF0bx/OV04WRqyLDXGOMlSz7MKAyupadOblpE8ygssaFJTi4birGrL2MC7skm+dKF+lqFV/92o0MS4Nnk9aZDP412vEf/nTUpefplVyvqSycqbQ8tSAztomoWFCiRZr2uCMqg6K8hzDCjrjm5fWcQmWltMdreG+2S8P8w+DT8e7w8Ovm0r6oo34q14LzLxWRyI7+JIjIQUP8WV+CP+dn4n3WQ32bsZTTrbnVfiFpLX15N3ufI=</latexit>

divergence between two distributions


<latexit sha1_base64="v9bt+1EYL0wajhoM0cArLcuq7gM=">AAACIXicbVDLTgJBEJzFF+IL9ehlIpp4IruYqEeiF4+YCJgAIbOzDUyYnd3M9EoI4Qf8DX/Aq/6BN+PNePc7nAUOAnbSSaWqO9VdfiyFQdf9cjIrq2vrG9nN3Nb2zu5efv+gZqJEc6jySEb6wWcGpFBQRYESHmINLPQl1P3+TarXH0EbEal7HMbQCllXiY7gDC3Vzp8EwspdUByoDzgAUBQHEQ2stRZ+kk6Zdr7gFt1J0WXgzUCBzKrSzv80g4gnISjkkhnT8NwYWyOmUXAJ41wzMRAz3mddaFioWAimNZp8M6anlgloJ9K2FdIJ+3djxEJjhqFvJ0OGPbOopeR/WiPBzlVrJFScoP13atRJJMWIptHYnzVwlEMLGNfC3kp5j2nG0QY45xKY9LRxzgbjLcawDGqlondRPL8rFcrXs4iy5IgckzPikUtSJrekQqqEkyfyQl7Jm/PsvDsfzud0NOPMdg7JXDnfv6aipOk=</latexit>

! cross entropy
<latexit sha1_base64="qCZrjboG+BNRgEZ9uVoGwxMyiA0=">AAACHXicbVC7TsMwFHXKq5RXgJHFokJiqhJAwFjBwlgk+pCaqHIcp7XqxJF9A1RRV36DH2CFP2BDrIgf4DtwHwNtOZKlo3PuyydIBdfgON9WYWl5ZXWtuF7a2Nza3rF39xpaZoqyOpVCqlZANBM8YXXgIFgrVYzEgWDNoH898pv3TGkukzsYpMyPSTfhEacEjNSxsad4twdEKfmAPWCPkFMltcYsASXTwbBjl52KMwZeJO6UlNEUtY7944WSZrEZQAXRuu06Kfg5UcCpYMOSl2mWEtonXdY2NCEx034+/skQHxklxJFU5iWAx+rfjpzEWg/iwFTGBHp63huJ/3ntDKJLP+dJmgFL6GRRlAkMEo9iwSFXjIIYGEKo4uZWTHtEEQomvJktoR6dNiyZYNz5GBZJ46TinldOb8/K1atpREV0gA7RMXLRBaqiG1RDdUTRE3pBr+jNerberQ/rc1JasKY9+2gG1tcvI3ajqQ==</latexit>

q(y|xl ) ! true distribution


<latexit sha1_base64="a0M8Y8L4spVqA5ZztI7oW2budGM=">AAACKnicbVDLTgJBEJz1ifhCPXqZSEz0INlVox6JXjxiImACGzI7NDBx9uFMr7JZ+Qp/wx/wqn/gjXg1foezwEHUSjqpVHenq8uLpNBo20NrZnZufmExt5RfXlldWy9sbNZ0GCsOVR7KUN14TIMUAVRRoISbSAHzPQl17/Yi69fvQWkRBteYROD6rBuIjuAMjdQqHNztJY/9ltynTSW6PWRKhQ+0idDHFFUMtG08KOHF2figVSjaJXsE+pc4E1IkE1Raha9mO+SxDwFyybRuOHaEbsoUCi5hkG/GGiLGb1kXGoYGzAftpqO3BnTXKG3aCZWpAOlI/bmRMl/rxPfMpM+wp3/3MvG/XiPGzpmbiiCKEQI+PtSJJcWQZhmZnxVwlIkhjCthvFLeY4pxNElOXWnrzNogb4Jxfsfwl9QOS85J6ejquFg+n0SUI9tkh+wRh5ySMrkkFVIlnDyRF/JK3qxn690aWh/j0RlrsrNFpmB9fgNOrKjq</latexit>

q(y|xl ) ⇡ h(y; yl ) ! one-hot-vector


<latexit sha1_base64="5myDp2Dx+9YDvevokb/dJfaKAAA=">AAACOHicbVDLThtBEJwlPBzzMskxlxEWkjlg7QKCSFwQueRoJPyQbMuaHbe9I2Znlplex6vFH8Nv5AdyhWNuOQVx5QsYPw4BUlJLpapudXeFiRQWff+3t/RheWV1rfCxuL6xubVd2vnUsDo1HOpcS21aIbMghYI6CpTQSgywOJTQDK+/Tf3mCIwVWl1hlkA3ZkMlBoIzdFKvdHZTyW7HPblPOyxJjB7TqJKdZTPBiGGEzBj9g3YQxphrBQeRxoMRcNRm0iuV/ao/A31PggUpkwVqvdLfTl/zNAaFXDJr24GfYDdnBgWXMCl2UgsJ49dsCG1HFYvBdvPZkxO655Q+HWjjSiGdqf9O5Cy2NotD1xkzjOxbbyr+z2unOPjazYVKUgTF54sGqaSo6TQx2hfG/SszRxg3wt1KecQM4+hyfbWlb6enTYoumOBtDO9J47AanFSPLo/L5xeLiArkC9klFRKQU3JOvpMaqRNO7sgvck8evJ/eH+/Re5q3LnmLmc/kFbznF0xardc=</latexit>

for L2 norm
<latexit sha1_base64="CpFtbwwpEDq07NbTMU+bsYOV6BE=">AAACEnicbVA7TsNAFFzzDeEXQFQ0KxIkqsgOCCgjaCgogkQ+UmJZ6806WWW9a+0+IyIrt+ACtHADOkTLBbgA58BOUpCEqUYz8/RG40eCG7Dtb2tpeWV1bT23kd/c2t7ZLeztN4yKNWV1qoTSLZ8YJrhkdeAgWCvSjIS+YE1/cJP5zUemDVfyAYYRc0PSkzzglEAqeYXDDrAnSAKlcenOq5SwVDoceYWiXbbHwIvEmZIimqLmFX46XUXjkEmgghjTduwI3IRo4FSwUb4TGxYROiA91k6pJCEzbjKuP8InqdLFWYVAScBj9e9FQkJjhqGfJkMCfTPvZeJ/XjuG4MpNuIxiYJJOHgWxwKBwtgXucs0oiGFKCNU87Yppn2hCIV1s5kvXZNVG+XQYZ36GRdKolJ2L8tn9ebF6PZ0oh47QMTpFDrpEVXSLaqiOKErQC3pFb9az9W59WJ+T6JI1vTlAM7C+fgHuz52D</latexit>

Power iteration method and the finite di↵erence method


<latexit sha1_base64="W+1zDtQdU05fl19xclUhZDdr8IA=">AAACP3icbVBNSwMxEM36bf2qevQSLIKnsqugHkUvHitYLbSlZLOzNphNlmRWKUv/j3/DP+BV/QN6E6/eTNc92NYHgcd7M5mZF6ZSWPT9N29mdm5+YXFpubKyura+Ud3curY6MxyaXEttWiGzIIWCJgqU0EoNsCSUcBPenY/8m3swVmh1hYMUugm7VSIWnKGTetWzTvFHbiAaNvQDGCoQTGHSBLCvI8pURLEPNBbKeTQScQwGFIeyoFet+XW/AJ0mQUlqpESjV33vRJpnCSjkklnbDvwUuzkzKLiEYaWTWUgZv2O30HZUsQRsNy/2HNI9p0Q01sY9hbRQ/3bkLLF2kISuMmHYt5PeSPzPa2cYn3RzodIM3XG/g+JMUtR0FJy72wBHOXCEcSPcrpT3mWHc5TU+JbKj1YYVF0wwGcM0uT6oB0f1w8uD2ulZGdES2SG7ZJ8E5JickgvSIE3CySN5Ji/k1XvyPrxP7+u3dMYre7bJGLzvHwoQsSw=</latexit>

ˆ p(y|x⇤ + r, ✓)]
<latexit sha1_base64="8OYyQTWFFPvhvEZjkMltArsN8/Y=">AAACP3icbVDLSgMxFM3Ud31VXboJFqHVUmZUVARB1IXLClYL7VAyadqGZh4kd8Rh7P/4G/6AW/UHdCdu3Zl2BrHVA4HDOedyb44TCK7ANF+NzMTk1PTM7Fx2fmFxaTm3snqt/FBSVqW+8GXNIYoJ7rEqcBCsFkhGXEewG6d3NvBvbplU3PeuIAqY7ZKOx9ucEtBSM3d6XpAlfNfcKuEGdBmQIj46xuf1oBDdJ2qXQJxY/WIJp/q2/MnbzVzeLJtD4L/ESkkepag0c2+Nlk9Dl3lABVGqbpkB2DGRwKlg/WwjVCwgtEc6rK6pR1ym7Hj41z7e1EoLt32pnwd4qP6eiImrVOQ6OukS6KpxbyD+59VDaB/aMfeCEJhHk0XtUGDw8aA43OKSURCRJoRKrm/FtEskoaDrHdnSUoPT+lldjDVew19yvVO29su7l3v5k9O0olm0jjZQAVnoAJ2gC1RBVUTRA3pCz+jFeDTejQ/jM4lmjHRmDY3A+PoGkF+s7Q==</latexit>

! for L1 norm D(r, x⇤ , ✓) := D[p(y|x⇤ , ✓),


<latexit sha1_base64="h/NRHGb/8+15Amn3rUJf/2Am3rk=">AAACI3icbVC7TgJBFJ3FF+ILtbSZCBorsqtGLYk2FhaYyCNhCZkdZmHC7Mxm5q5KNvyBv+EP2Oof2BkbC1u/w12gEPBUJ+fcm3vu8ULBDdj2l5VZWFxaXsmu5tbWNza38ts7NaMiTVmVKqF0wyOGCS5ZFTgI1gg1I4EnWN3rX6V+/Z5pw5W8g0HIWgHpSu5zSiCR2vlDV/NuD4jW6gG7wB4h9pXGxZu2y6UPgyKWSgfDdr5gl+wR8DxxJqSAJqi08z9uR9EoYBKoIMY0HTuEVkw0cCrYMOdGhoWE9kmXNRMqScBMKx79M8QHidLBaQ5fScAj9e9GTAJjBoGXTAYEembWS8X/vGYE/kUr5jKMgEk6PuRHAoPCaTm4wzWjIAYJIVTzJCumPaIJhaTCqSsdk0Yb5pJinNka5kntuOSclU5uTwvly0lFWbSH9tERctA5KqNrVEFVRNETekGv6M16tt6tD+tzPJqxJju7aArW9y9/YqVG</latexit>

ˆ takes the minimal value at r = 0 and rr D(r, x⇤ , ✓)|


ˆ r=0 = 0.
<latexit sha1_base64="VIqSbC6mHfjCoNaKV4G8+FFmhlw=">AAACbHicdVFNb9NAEF2bFkr4Ch+3CmlEDCqoiuxWAi6VKuDAsUikrRRH1ng9aVZZr63dcdXI+Gdy4A9w5A9wYZPmQFsYaaW3783bGb3Na60cx/GPILy1sXn7ztbd3r37Dx4+6j9+cuyqxkoayUpX9jRHR1oZGrFiTae1JSxzTSf5/ONSPzkn61RlvvKipkmJZ0ZNlUT2VNavok87dhcusje7kM6Q25RnxNi9joBxTg78FUplVIkazlE3BMgQWTiAOAI0BUSpwVxjZuE/L33LWnsQdyvHMOsP4mG8KrgJkjUYiHUdZf2faVHJpiTDUqNz4ySuedKiZSU1db20cVSjnOMZjT00WJKbtKtgOnjpmQKmlfXHMKzYvx0tls4tytx3lsgzd11bkv/Sxg1P309aZeqGycjLQdNGA1ewTBkKZUmyXniA0iq/K8gZWpTs/+LKlMItV+t6Ppjkegw3wfHeMHk73P+yNzj8sI5oS2yLF2JHJOKdOBSfxZEYCSm+i9/BRrAZ/Aqfhdvh88vWMFh7noorFb76A2L6twQ=</latexit>

Virtual Adversarial Training


<latexit sha1_base64="eJAErnnUpOVTn8Fo8/IyOaHYy6k=">AAACJHicbVDLTsJAFJ3iC/FVdelmIjG6Ii0m6hJ14xITQBIg5HY6wITptJmZmpCGT/A3/AG3+gfujAs3Lv0Op6ULAU8yyck59849OV7EmdKO82UVVlbX1jeKm6Wt7Z3dPXv/oKXCWBLaJCEPZdsDRTkTtKmZ5rQdSQqBx+mDN75N/YdHKhULRUNPItoLYCjYgBHQRurbp93sj0RSf9piUsfA8bWfboBkhjckMMHEsG+XnYqTAS8TNydllKPet3+6fkjigApNOCjVcZ1I9xKQmhFOp6VurGgEZAxD2jFUQEBVL8nCTPGJUXw8CKV5QuNM/buRQKDUJPDMZAB6pBa9VPzP68R6cNVLmIhiTQWZHRrEHOsQp+1gn0lKNJ8YAkQykxWTEUgg2jQyd8VXabRpyRTjLtawTFrVintROb+vlms3eUVFdISO0Rly0SWqoTtUR01E0BN6Qa/ozXq23q0P63M2WrDynUM0B+v7FylEpjM=</latexit>

D(r, x⇤ , ✓)

ˆ Hessian matrix
<latexit sha1_base64="jHuPQnS42gYVChIFRmqrHlMdOjM=">AAACCXicbVDLSsNAFL2pr1pfVZduBovgqiQV1GXRTZcV7APaWCaTSTt0MgkzE7GEfoE/4Fb/wJ249Sv8Ab/DSZuFbT1w4XDOvdzD8WLOlLbtb6uwtr6xuVXcLu3s7u0flA+P2ipKJKEtEvFIdj2sKGeCtjTTnHZjSXHocdrxxreZ33mkUrFI3OtJTN0QDwULGMHaSA8NqhTDAoVYS/Y0KFfsqj0DWiVOTiqQozko//T9iCQhFZpwrFTPsWPtplhqRjidlvqJojEmYzykPUMFDqly01nqKTozio+CSJoRGs3UvxcpDpWahJ7ZNPFGatnLxP+8XqKDazdlIk40FWT+KEg40hHKKkA+k5RoPjEEE8lMVkRGWGKiTVELX3yVRZuWTDHOcg2rpF2rOpfVi7tapX6TV1SEEziFc3DgCurQgCa0gICEF3iFN+vZerc+rM/5asHKb45hAdbXLxL6mwk=</latexit>

<latexit sha1_base64="/o6EV7PJQ1Ftnha4/cpzXc5ODek=">AAACP3icbVDLSgNBEJz1bXxFPXoZTQQFCbsK6lH04lHFqJCE0DvpuIOzD2d61bDmf/wNf8Cr+gN6E6/enI05+GoYKKqru2vKT5Q05LrPzsDg0PDI6Nh4YWJyanqmODt3YuJUC6yKWMX6zAeDSkZYJUkKzxKNEPoKT/2Lvbx/eoXayDg6pk6CjRDOI9mWAshSzeLuESYKBPLy5Urn9ma1zK8lBVyS4SLVGiPiaEiGQFaS5JI1Xg+AsjoFSNBdLS82iyW34vaK/wVeH5RYvw6axZd6KxZpaJcLBcbUPDehRgaapFDYLdRTgwmICzjHmoURhGgaWe+vXb5smRZvx9o+a67Hfp/IIDSmE/pWaU0H5ncvJ//r1VJqbzcyGSUpYSS+DrVTxSnmeXC8JTUKUh0LQGhpvXIRgAZBNt4fV1omt9Yt2GC83zH8BSfrFW+zsnG4XtrZ7Uc0xhbYElthHttiO2yfHbAqE+yOPbBH9uTcO6/Om/P+JR1w+jPz7Ec5H59KT68G</latexit>

Replace q(y|x) with its current estimate p(y|x, ✓)!


! first dominant eigenvector of H with
<latexit sha1_base64="LsEGoKAY6DCSnq32GngNsCTItGM=">AAACS3icbVC7TgMxEPSFd3gFKGksEiSq6A4QUCJoqBBIBJByUeTz7SUWPvtk7wWiU/6K3+ADoAXxA3SIAiek4DXVaGbXO54ok8Ki7z96pYnJqemZ2bny/MLi0nJlZfXS6txwaHAttbmOmAUpFDRQoITrzABLIwlX0c3x0L/qgbFCqwvsZ9BKWUeJRHCGTmpXTkMjOl1kxuhbGiLcYZEIY5HGOhWKKaQgOqB6wFEbqhNaO6nRW4FdOnoI8xhoLYTMCqlVbdCuVP26PwL9S4IxqZIxztqV1zDWPE9BIZfM2mbgZ9gqmEHBJQzKYW4hY/yGdaDpqGIp2FYx+veAbjolpokLlmiXdKR+3yhYam0/jdxkyrBrf3tD8T+vmWNy0CqEynIExb8OJbmkqOmwRBoL4wqRfUcYN8JlpbzLDOPoqv5xJbbDaIOyKyb4XcNfcrldD/bqO+e71cOjcUWzZJ1skC0SkH1ySE7IGWkQTu7JE3kmL96D9+a9ex9foyVvvLNGfqA09QlTd7SV</latexit>

! first dominant eigenvector of H with magnitude ✏


<latexit sha1_base64="LsEGoKAY6DCSnq32GngNsCTItGM=">AAACS3icbVC7TgMxEPSFd3gFKGksEiSq6A4QUCJoqBBIBJByUeTz7SUWPvtk7wWiU/6K3+ADoAXxA3SIAiek4DXVaGbXO54ok8Ki7z96pYnJqemZ2bny/MLi0nJlZfXS6txwaHAttbmOmAUpFDRQoITrzABLIwlX0c3x0L/qgbFCqwvsZ9BKWUeJRHCGTmpXTkMjOl1kxuhbGiLcYZEIY5HGOhWKKaQgOqB6wFEbqhNaO6nRW4FdOnoI8xhoLYTMCqlVbdCuVP26PwL9S4IxqZIxztqV1zDWPE9BIZfM2mbgZ9gqmEHBJQzKYW4hY/yGdaDpqGIp2FYx+veAbjolpokLlmiXdKR+3yhYam0/jdxkyrBrf3tD8T+vmWNy0CqEynIExb8OJbmkqOmwRBoL4wqRfUcYN8JlpbzLDOPoqv5xJbbDaIOyKyb4XcNfcrldD/bqO+e71cOjcUWzZJ1skC0SkH1ySE7IGWkQTu7JE3kmL96D9+a9ex9foyVvvLNGfqA09QlTd7SV</latexit>

Local Distributional Smoothness


<latexit sha1_base64="ihQdvUEbk04GcPmd1TZsNVHgbdo=">AAACHHicbVDLSgMxFM3UV62vUZcuDBbBVZmpoC6LunDhoqJ9QDuUTCbThmaSIckIZejS3/AH3OofuBO3gj/gd5hpZ2FbDwTOPfdezs3xY0aVdpxvq7C0vLK6VlwvbWxube/Yu3tNJRKJSQMLJmTbR4owyklDU81IO5YERT4jLX94lfVbj0QqKviDHsXEi1Cf05BipI3Usw9vBUYMXhsnSf0kE015HwmhB5wo1bPLTsWZAC4SNydlkKPes3+6gcBJRLjGDCnVcZ1YeymSmmJGxqVuokiM8BD1ScdQjiKivHTykTE8NkoAQyHN4xpO1L8bKYqUGkW+mYyQHqj5Xib+1+skOrzwUsrjRBOOp0ZhwqAWMEsFBlQSrNnIEIQlNbdCPEASYW2ym3EJVHbauGSCcedjWCTNasU9q5zeVcu1yzyiIjgAR+AEuOAc1MANqIMGwOAJvIBX8GY9W+/Wh/U5HS1Y+c4+mIH19Qs2HqKP</latexit>

⇠ = 10 6 , d is a randomly sampled u
<latexit sha1_base64="vApTuqdlrtscS+LwCyy239DU06Q=">AAACNHicbVDLSgNBEJz1bXxFPXppTAQPGnYV1IsgevGoYFRIYuidndXBeSwzs2JY8in+hj/gVT9A8CZ69BucxBx81amo6qa6K84Ety4Mn4Oh4ZHRsfGJydLU9MzsXHl+4dTq3FBWp1pocx6jZYIrVnfcCXaeGYYyFuwsvj7o+Wc3zFiu1YnrZKwl8VLxlFN0XmqXt6vNWw67EIUXxfpWt7oG1aQK3AKCQZVoKTpgUWaCJZAr7uCGUacNtMuVsBb2AX9JNCAVMsBRu/zeTDTNJVOOCrS2EYWZaxVoHKeCdUvN3LIM6TVesoanCiWzraL/YBdWvJJA6nNTrRz01e8bBUprOzL2kxLdlf3t9cT/vEbu0p1WwVWWO6boV1CaC3Aaem1Bwo3/13eQcKSG+1uBXqFB6nynP1IS2zutW/LFRL9r+EtON2rRVm3zeKOytz+oaIIskWWySiKyTfbIITkidULJHXkgj+QpuA9egtfg7Wt0KBjsLJIfCD4+Ab6gqSQ=</latexit>

⇠ = 10 6 , d is a randomly sampled unit vector


<latexit sha1_base64="vApTuqdlrtscS+LwCyy239DU06Q=">AAACNHicbVDLSgNBEJz1bXxFPXppTAQPGnYV1IsgevGoYFRIYuidndXBeSwzs2JY8in+hj/gVT9A8CZ69BucxBx81amo6qa6K84Ety4Mn4Oh4ZHRsfGJydLU9MzsXHl+4dTq3FBWp1pocx6jZYIrVnfcCXaeGYYyFuwsvj7o+Wc3zFiu1YnrZKwl8VLxlFN0XmqXt6vNWw67EIUXxfpWt7oG1aQK3AKCQZVoKTpgUWaCJZAr7uCGUacNtMuVsBb2AX9JNCAVMsBRu/zeTDTNJVOOCrS2EYWZaxVoHKeCdUvN3LIM6TVesoanCiWzraL/YBdWvJJA6nNTrRz01e8bBUprOzL2kxLdlf3t9cT/vEbu0p1WwVWWO6boV1CaC3Aaem1Bwo3/13eQcKSG+1uBXqFB6nynP1IS2zutW/LFRL9r+EtON2rRVm3zeKOytz+oaIIskWWySiKyTfbIITkidULJHXkgj+QpuA9egtfg7Wt0KBjsLJIfCD4+Ab6gqSQ=</latexit>

Miyato, Takeru, et al. "Virtual adversarial training: a regularization method for supervised and semi-supervised
virtual adversarial perturbation
<latexit sha1_base64="Qh2osFOYCdPsGwJK8D6D6K5gqp0=">AAACHXicbVBLSgNBFOzxG+Mv6tJNYxBchZkI6jLoxmUE84FkCG963iRNej509wRCyNZreAG3egN34la8gOewJ5mFSSxoKKrq8V6XlwiutG1/W2vrG5tb24Wd4u7e/sFh6ei4qeJUMmywWMSy7YFCwSNsaK4FthOJEHoCW97wLvNbI5SKx9GjHifohtCPeMAZaCP1SnTEpU5BUPCzGEhueIJGk14eKdsVewa6SpyclEmOeq/00/VjloYYaSZAqY5jJ9qdgNScCZwWu6nCBNgQ+tgxNIIQlTuZ/WRKz43i0yCW5kWaztS/ExMIlRqHnkmGoAdq2cvE/7xOqoMbd8KjJNUYsfmiIBVUxzSrhfpcItNibAgwyc2tlA1AAtOmlYUtvspOmxZNMc5yDaukWa04V5XLh2q5dptXVCCn5IxcEIdckxq5J3XSIIw8kRfySt6sZ+vd+rA+59E1K585IQuwvn4BsnGjYg==</latexit>

learning." IEEE transactions on pattern analysis and machine intelligence 41.8 (2018): 1979-1993.
Mean teachers are better role models:
Weight-averaged consistency targets improve semi-supervised deep learning results YouTube Video

J ! consistency loss
<latexit sha1_base64="Wn+jZ8R8q5C53IdTpeNspuCKW9g=">AAACInicbVDLSgNBEJyNrxhfqx69DIaAp7Croh6DXsRTBPOAJITZySQZMjuzzPSqYckX+Bv+gFf9A2/iSfDsdzib5GASGxqKqm6qu4JIcAOe9+VklpZXVtey67mNza3tHXd3r2pUrCmrUCWUrgfEMMElqwAHweqRZiQMBKsFg6tUr90zbbiSdzCMWCskPcm7nBKwVNst3OCm5r0+EK3VA24Ce4SEKmmsNZN0iIUyZtR2817RGxdeBP4U5NG0ym33p9lRNA6ZBCqIMQ3fi6CVEA2cCjbKNWPDIkIHpMcaFkoSMtNKxu+McMEyHdxV2rYEPGb/biQkNGYYBnYyJNA381pK/qc1YuhetBIuozj9bWLUjQUGhdNscIdrRkEMLSBUc3srpn2iCQWb4IxLx6SnjXI2GH8+hkVQPS76Z8WT29N86XIaURYdoEN0hHx0jkroGpVRBVH0hF7QK3pznp1358P5nIxmnOnOPpop5/sXsTSlfQ==</latexit>

student model ! weights ✓ and noise ⌘


<latexit sha1_base64="vC6y1wxs+VWshMTFzklA+abT/m0=">AAACRnicbZC/bhNBEMbnTCDBAWKgpFnFQaKy7kgElBE00AUJJ5F8ljW3N7ZX2ds97c4lWKd7Jl6DF6CgSTpKOkTLnu2C/BlppU+/mdE3+2WlVp7j+GfUubdx/8Hm1sPu9qPHT3Z6T58de1s5SUNptXWnGXrSytCQFWs6LR1hkWk6yc4+tP2Tc3JeWfOFFyWNC5wZNVUSOaBJ71PK9JVrz1VOhkVhc9KNSJ2azRmdsxdiNXBBLfFiL+U5Me4JNLkwVnkKqAXNpNePB/GyxG2RrEUf1nU06f1KcyurIvhKjd6PkrjkcY2OldTUdNPKU4nyDGc0CtJgQX5cL7/ciJeB5GJqXXjh7iX9f6PGwvtFkYXJAnnub/ZaeFdvVPH03bhWpqyYjFwZTSst2Io2P5ErR5L1IgiUToVbhZyjQ8kh5WsuuW9Pa7ohmORmDLfF8etB8maw//mgf/h+HdEWvIBdeAUJvIVD+AhHMAQJ3+AHXMJV9D36Hf2J/q5GO9F65zlcqw78A3+Asr4=</latexit>

teacher model ! weights ✓0 and noise ⌘ 0


<latexit sha1_base64="u7uRD5GJ3qUxH6po4FPB8NTQLyg=">AAACSHicbZDBTttAEIbXobSQ0jbAkcuqoWpPkd1WpUdEL0hcQGoAKY6i8XoSr1jvWrtjaGT5oXgNXqDqrVw4c0PcWCc5FOhIK/36/hnN7J8USjoKwz9Ba+nF8stXK6vt12tv3r7rrG8cO1NagX1hlLGnCThUUmOfJCk8LSxCnig8Sc5+NP7JOVonjf5J0wKHOUy0HEsB5NGocxAT/qKKEESGlucmRVXz2MpJRmCtueDzhgtsiOPbMWVI8HGbg065NtKhZzNSjzrdsBfOij8X0UJ02aIOR52bODWizFGTUODcIAoLGlZgSQqFdTsuHRYgzmCCAy815OiG1ezTNf/gScrHxvqnic/ovxMV5M5N88R35kCZe+o18H/eoKTx92EldVESajFfNC4VJ8ObBHkqLQpSUy9AWOlv5SIDC4J8zo+2pK45rW77YKKnMTwXx5970bfel6Ov3d29RUQrbIu9Z59YxHbYLttnh6zPBLtkv9lfdh1cBbfBXXA/b20Fi5lN9qharQcrkbL1</latexit>

J(✓) = Ex,⌘,⌘0 [kf (x, ✓0 , ⌘ 0 ) f (x, ✓, ⌘)k2 ]


<latexit sha1_base64="lydhnQvcklvxBOSj4L7kZMeAJ0E=">AAACVXicbZBLS8NAEMe3sb7qq+rRy2KRtlBLoqJeBFEE8VTB2kITy2a7sYubB7sTscR8Nb+GeBdv+g0EN2kPvgZ2+fObGWbm70aCKzDNl4IxVZyemZ2bLy0sLi2vlFfXrlUYS8raNBSh7LpEMcED1gYOgnUjyYjvCtZx706zfOeeScXD4ApGEXN8chtwj1MCGvXL3YuaDUMGpI6PsO0TGLpucpb2k4cGtjUe/9UU9+xH7NUeGuPqaiPHdbz9Deasju3Hmx3s9MsVs2nmgf8KayIqaBKtfvnNHoQ09lkAVBClepYZgZMQCZwKlpbsWLGI0Dtyy3paBsRnyklyB1K8pckAe6HULwCc0+8dCfGVGvmursxuVL9zGfwv14vBO3QSHkQxsICOB3mxwBDizE484JJRECMtCJVc74rpkEhCQZv+Y8pAZaulJW2M9duGv+J6p2ntN3cv9yrHJxOL5tAG2kQ1ZKEDdIzOUQu1EUVP6BW9o4/Cc+HTKBoz41KjMOlZRz/CWPkCyEiyPw==</latexit>

✓t0 = ↵✓t0 1 + (1 ↵)✓t ! Exponential Movin


<latexit sha1_base64="evwLG0LdzTbL0KVj8tfNp4m1J6I=">AAACZ3icbVFdSxtBFJ2s/bDph9FCKfRl2lAaEcOuStuXglqEvggWGhWyIdyd3GSnzs4sM3etYdn/2Nf+AcE/0Fc7SbZQtRcGDufce8/lTJIr6SgMfzWCpXv3HzxcftR8/OTps5XW6tqxM4UV2BNGGXuagEMlNfZIksLT3CJkicKT5OzzTD85R+uk0d9omuMgg4mWYymAPDVsfY8pRYJ3Q+KfeAwqT4H/pUrajCq+wTvR5kJZryXfHFs5SQmsNT88iRdUHlzkRqMmCYofmnOpJ3zPO8MEeefgcG+9GrbaYTecF78Lohq0WV1Hw9ZlPDKiyPxSocC5fhTmNCjBkhQKq2ZcOMxBnHmLvocaMnSDcp5Jxd96ZsTHxvqnic/ZfydKyJybZonvzIBSd1ubkf/T+gWNPw5KqfOCUIuF0bhQnAyfBcxH0qIgNfUAhJX+Vi5SsCDIf8MNl5GbnVY1fTDR7RjuguOtbvS+u/11p727X0e0zF6xN6zDIvaB7bIv7Ij1mGA/2W923WCNq2AleBG8XLQGjXrmObtRwes/gx25mw==</latexit>

✓t0 = ↵✓t0 1 + (1 ↵)✓t ! Exponential Moving Average (EMA)


<latexit sha1_base64="evwLG0LdzTbL0KVj8tfNp4m1J6I=">AAACZ3icbVFdSxtBFJ2s/bDph9FCKfRl2lAaEcOuStuXglqEvggWGhWyIdyd3GSnzs4sM3etYdn/2Nf+AcE/0Fc7SbZQtRcGDufce8/lTJIr6SgMfzWCpXv3HzxcftR8/OTps5XW6tqxM4UV2BNGGXuagEMlNfZIksLT3CJkicKT5OzzTD85R+uk0d9omuMgg4mWYymAPDVsfY8pRYJ3Q+KfeAwqT4H/pUrajCq+wTvR5kJZryXfHFs5SQmsNT88iRdUHlzkRqMmCYofmnOpJ3zPO8MEeefgcG+9GrbaYTecF78Lohq0WV1Hw9ZlPDKiyPxSocC5fhTmNCjBkhQKq2ZcOMxBnHmLvocaMnSDcp5Jxd96ZsTHxvqnic/ZfydKyJybZonvzIBSd1ubkf/T+gWNPw5KqfOCUIuF0bhQnAyfBcxH0qIgNfUAhJX+Vi5SsCDIf8MNl5GbnVY1fTDR7RjuguOtbvS+u/11p727X0e0zF6xN6zDIvaB7bIv7Ij1mGA/2W923WCNq2AleBG8XLQGjXrmObtRwes/gx25mw==</latexit>

Three types of noise:


<latexit sha1_base64="CDfrJ73Kd7F1Tq8w/WyiTBaM+Nw=">AAACr3icbVHLjtMwFHXCayivAhskNldUSGymagYJEAs0gsXMcpCm05GaUm6cG2LVsSP7hlGp+lP8DT/Ad+AkXcyDK1k+Ovfch4+zWivPk8mfKL51+87de3v3Bw8ePnr8ZPj02Zm3jZM0lVZbd56hJ60MTVmxpvPaEVaZplm2+tLmZz/JeWXNKa9rWlT4w6hCSeRALYe/U2OVyckwnJaOCFqRB1tA4D19TNPB/j44NLmtgMPtdVfpIVBQWqd+WcOoodCq7uq4JFCmbhhUmEW+73CEjfcKTd8WrLmk07gm18tyZ2sbKKzD4ymHC8Wl6rWG+MK6FQyWw9FkPOkCboJkB0ZiFyfL4d80t7KpwiOlRu/nyaTmxQYdK6lpO0gbTzXKVdh2HqDBivxi03m7hdeByaGwLpxgUsderthg5f26yoKyQi799VxL/i83b7j4sNh0DpCR/aCi0cC2+wPIlSPJeh0ASqfCriBLdCg5fOeVKblvV9u2xiTXbbgJzg7Gybvx268Ho8PPO4v2xEvxSrwRiXgvDsWxOBFTIaMX0afoKDqOk3gWf4u/99I42tU8F1ciVv8AKRTTRg==</latexit>

Three types of noise: and horizontal flips


<latexit sha1_base64="CDfrJ73Kd7F1Tq8w/WyiTBaM+Nw=">AAACr3icbVHLjtMwFHXCayivAhskNldUSGymagYJEAs0gsXMcpCm05GaUm6cG2LVsSP7hlGp+lP8DT/Ad+AkXcyDK1k+Ovfch4+zWivPk8mfKL51+87de3v3Bw8ePnr8ZPj02Zm3jZM0lVZbd56hJ60MTVmxpvPaEVaZplm2+tLmZz/JeWXNKa9rWlT4w6hCSeRALYe/U2OVyckwnJaOCFqRB1tA4D19TNPB/j44NLmtgMPtdVfpIVBQWqd+WcOoodCq7uq4JFCmbhhUmEW+73CEjfcKTd8WrLmk07gm18tyZ2sbKKzD4ymHC8Wl6rWG+MK6FQyWw9FkPOkCboJkB0ZiFyfL4d80t7KpwiOlRu/nyaTmxQYdK6lpO0gbTzXKVdh2HqDBivxi03m7hdeByaGwLpxgUsderthg5f26yoKyQi799VxL/i83b7j4sNh0DpCR/aCi0cC2+wPIlSPJeh0ASqfCriBLdCg5fOeVKblvV9u2xiTXbbgJzg7Gybvx268Ho8PPO4v2xEvxSrwRiXgvDsWxOBFTIaMX0afoKDqOk3gWf4u/99I42tU8F1ciVv8AKRTTRg==</latexit>

Three types of noise:


<latexit sha1_base64="CDfrJ73Kd7F1Tq8w/WyiTBaM+Nw=">AAACr3icbVHLjtMwFHXCayivAhskNldUSGymagYJEAs0gsXMcpCm05GaUm6cG2LVsSP7hlGp+lP8DT/Ad+AkXcyDK1k+Ovfch4+zWivPk8mfKL51+87de3v3Bw8ePnr8ZPj02Zm3jZM0lVZbd56hJ60MTVmxpvPaEVaZplm2+tLmZz/JeWXNKa9rWlT4w6hCSeRALYe/U2OVyckwnJaOCFqRB1tA4D19TNPB/j44NLmtgMPtdVfpIVBQWqd+WcOoodCq7uq4JFCmbhhUmEW+73CEjfcKTd8WrLmk07gm18tyZ2sbKKzD4ymHC8Wl6rWG+MK6FQyWw9FkPOkCboJkB0ZiFyfL4d80t7KpwiOlRu/nyaTmxQYdK6lpO0gbTzXKVdh2HqDBivxi03m7hdeByaGwLpxgUsderthg5f26yoKyQi799VxL/i83b7j4sNh0DpCR/aCi0cC2+wPIlSPJeh0ASqfCriBLdCg5fOeVKblvV9u2xiTXbbgJzg7Gybvx268Ho8PPO4v2xEvxSrwRiXgvDsWxOBFTIaMX0afoKDqOk3gWf4u/99I42tU8F1ciVv8AKRTTRg==</latexit>

– random translations
– random translations and horizontal – random translations
flips of the
Gaussian input
noise theand
onimages horizontal
input layer flips
– Gaussian noise on the input layer – Gaussian noise on
dropout applied the input
within layer
the network
– dropout applied within the network – dropout applied within the network
Ramp up the scale of consistency loss from
<latexit sha1_base64="OVjg5pOIgHr+je39135AsCa92BM=">AAACQHicbVDLSgNBEJz1bXxFPXppDIKnsKugHkUvHlWMCjGE3kmvGZzHMjMrxJAP8jf8Aa/6A+JNvHpyNubgq6ChqO6muivNpXA+jp+jsfGJyanpmdnK3PzC4lJ1eeXcmcJyanAjjb1M0ZEUmhpeeEmXuSVUqaSL9Oaw7F/cknXC6DPfy6ml8FqLTHD0QWpXD09R5VDk4LsEjqMkMBlwo13wJs17II1zkFmj4I6sAW9A+CAIjRJuURZUaVdrcT0eAv6SZERqbITjdvXlqmN4oUh7LtG5ZhLnvtVH6wWXNKhcFY5y5Dd4Tc1ANSpyrf7w2QFsBKUDmbGhtIeh+n2jj8q5nkrDpELfdb97pfhfr1n4bK/VFzovyse/jLJClh+XyUFHWOJe9gJBbkW4FXgXLXIf8v3h0nHlaYMymOR3DH/J+VY92alvn2zV9g9GEc2wNbbONlnCdtk+O2LHrME4u2eP7Ik9Rw/Ra/QWvX+NjkWjnVX2A9HHJ2OMsCo=</latexit>

Ramp up the scale of consistency loss from zero to its final value
<latexit sha1_base64="OVjg5pOIgHr+je39135AsCa92BM=">AAACQHicbVDLSgNBEJz1bXxFPXppDIKnsKugHkUvHlWMCjGE3kmvGZzHMjMrxJAP8jf8Aa/6A+JNvHpyNubgq6ChqO6muivNpXA+jp+jsfGJyanpmdnK3PzC4lJ1eeXcmcJyanAjjb1M0ZEUmhpeeEmXuSVUqaSL9Oaw7F/cknXC6DPfy6ml8FqLTHD0QWpXD09R5VDk4LsEjqMkMBlwo13wJs17II1zkFmj4I6sAW9A+CAIjRJuURZUaVdrcT0eAv6SZERqbITjdvXlqmN4oUh7LtG5ZhLnvtVH6wWXNKhcFY5y5Dd4Tc1ANSpyrf7w2QFsBKUDmbGhtIeh+n2jj8q5nkrDpELfdb97pfhfr1n4bK/VFzovyse/jLJClh+XyUFHWOJe9gJBbkW4FXgXLXIf8v3h0nHlaYMymOR3DH/J+VY92alvn2zV9g9GEc2wNbbONlnCdtk+O2LHrME4u2eP7Ik9Rw/Ra/QWvX+NjkWjnVX2A9HHJ2OMsCo=</latexit>

Temporal Ensembling: Maintains an


<latexit sha1_base64="DYe5g6COj7RFfRVpj2ReTvrnzlw=">AAACinicbVHdahNBFJ5df1pj1WivijdDg+BV2E1Brd4UteCNUKFpC9kQzs6eTYbOHzOzpSHkEXxAX8AX6At4Ng1iWg8c+PjO+c755kzplAwxy34l6YOHjx5vbT/pPN159vxF9+Wrs2AbL3AorLL+ooSAShocRhkVXjiPoEuF5+Xll7Z+foU+SGtO49zhWMPUyFoKiERNuj8LY6Wp0ER+itpZD4ofm4A0QJrpR/4dpImUgYMpig7/G3jtrCGZJIG2V9TMgRbBFLmtuYIS1UY/uaqkaJcGbg1HEDMePU1ulXgN2insT7q9rJ+tgt8H+Rr02DpOJt3fRWVFo8mIUBDCKM9cHC/ARykULjtFE9CBuCRbI4IGNIbxYnW2JX9DTMVr6ynp/Sv2X8UCdAhzXVKnhjgLd2st+b/aqIn1h/FCGtdENOJ2Ud0oHi1v/4BX0qOIak4AhJfklYsZeBCRfmpjSxVaa8sOHSa/e4b74GzQz9/1D34Mekef1yfaZq/ZPnvLcvaeHbFv7IQNmWA3yV6yn/TSnXSQHqafblvTZK3ZZRuRfv0DYKzCWw==</latexit>

exponential moving average of label


predictions on each training example.

Tarvainen, Antti, and Harri Valpola. "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results."
arXiv preprint arXiv:1703.01780 (2017).
MixMatch: A Holistic Approach
to Semi-Supervised Learning YouTube Playlist

pmodel (y|x; ✓) ! a generic model which produces a distribution over class labels y for an input x with parameters ✓
<latexit sha1_base64="HN/WTGhv741eBcMt+TgeaVXhS2M=">AAACmHicbVHLjtMwFHXCayiPKbCDzRUNYthUCSBAYlPBAtgNiM6M1FSV49w01jhxZN/QRiFfwpfxA3wHTtoFM8NdHZ37OMfHSaWkpTD87fnXrt+4eevg9ujO3Xv3D8cPHp5YXRuBc6GVNmcJt6hkiXOSpPCsMsiLROFpcv6x75/+QGOlLr9TU+Gy4OtSZlJwctRq/KtaxYRbagudouqOmp/b9zHlSPwFxEauc+LG6A3shjissUQjBQzjsMmlyKEyOq0FWuCQOsdGJnV/HLQTBqG4taB4gspC0ASQaQO8BFlWNUGwDWAjyd3ghhdIzikEO/2gW40n4TQcCq6CaA8mbF/Hq/GfONWiLrCkQXYRhRUtW25ICoXdKK4tVlyc8zUuHCydol22Q4gdPHNMOrjLdEkwsP9utLywtikSN1lwyu3lXk/+r7eoKXu3bIf3Yil2QlmtgDT0P+IyMyhINQ5wYaTzCiJ3cYg+jQsqqe2tdSMXTHQ5hqvg5OU0ejN99fX1ZPZhH9EBe8KesiMWsbdsxj6zYzZnwvO8517oRf5jf+Z/8r/sRn1vv/OIXSj/21+f18vH</latexit>

<latexit sha1_base64="7wwclTPPfkBr2B1Nd6Sy7pu0qnM=">AAACF3icbVBLTgJBFOzBH+Jv1J1uOoKJKzKjRl0S3bjERD4JENLTPKBDzyfdb4xkQuI1vIBbvYE749alF/Ac9gALASvppFL1Xup1eZEUGh3n28osLa+srmXXcxubW9s79u5eVYex4lDhoQxV3WMapAigggIl1CMFzPck1LzBTerXHkBpEQb3OIyg5bNeILqCMzRS2z5oIjxiEjHFfEAzSAtN7AOywqht552iMwZdJO6U5MkU5bb90+yEPPYhQC6Z1g3XibCVMIWCSxjlmrGGiPEB60HD0MAk6lYy/sOIHhulQ7uhMi9AOlb/biTM13roe2bSZ9jX814q/uc1YuxetRIRRDFCwCdB3VhSDGlaCO0IBRzl0BDGlTC3Ut43dfC0jZmUjk5PG+VMMe58DYukelp0L4pnd+f50vW0oiw5JEfkhLjkkpTILSmTCuHkibyQV/JmPVvv1of1ORnNWNOdfTID6+sXGPigWw==</latexit>

Consistency Regularization
<latexit sha1_base64="lGcpTrxly2mD6OOFbIy56UTToZQ=">AAACInicbVDLTgIxFO3gC/GFunTTSEhckRlM1CWRjUs08kiAkE7nAg2dzqTtmIwTvsDf8Afc6h+4M65MXPsddmAWAp6kycm55z563JAzpW37y8qtrW9sbuW3Czu7e/sHxcOjlgoiSaFJAx7IjksUcCagqZnm0AklEN/l0HYn9bTefgCpWCDudRxC3ycjwYaMEm2kQbHcm81IJHjTeiCU2QiCxvgORhEnkj1mvpJdsWfAq8TJSAllaAyKPz0voJEPQlNOlOo6dqj7CZGaUQ7TQi9SEBI6ISPoGiqID6qfzE6Z4rJRPDwMpHlC45n6tyMhvlKx7xqnT/RYLddS8b9aN9LDq37CRBiln5wvGkYc6wCn2WCPSaCax4YQKpm5FdMxkYRqk+DCFk+lp00LJhhnOYZV0qpWnIvK+W21VLvOIsqjE3SKzpCDLlEN3aAGaiKKntALekVv1rP1bn1Yn3Nrzsp6jtECrO9fRwel0A==</latexit>

encourage the model to produce the same output distribution when its inputs are perturbed
<latexit sha1_base64="xiJlcTn8iP/j/kJhlqgslRJCOgo=">AAACXXicbZA7T8MwFIXd8C6vAgMDi0WFxFQlIAEjgoWxSLQgtVVxnFtq4diRfQ1UUf8e/4GJjYkVZpy0A68rWTo69177+IszKSyG4UslmJmdm19YXKour6yurdc2NttWO8OhxbXU5iZmFqRQ0EKBEm4yAyyNJVzH9+dF//oBjBVaXeEog17K7pQYCM7QW/3abRfhCXNQ3N/I7oDiEGiqE5AUNc2MThyfmJalQLXDzCFNfDAjYlfcQR+HoKhAS4XyPUuZAZqBQWdiSMb9Wj1shGXRvyKaijqZVrNfe+smmrsUFHLJrO1EYYa9nBkUXMK42nUWMsbvfdiOl8rHsr28JDGme95J6EAbfxTS0v2+kbPU2lEa+8mU4dD+7hXmf72Ow8FJLy9/6FlNHhq4ElKB1RMxwFGOvGDcCJ+V8iEzjKOH/+OVxBbRxlUPJvqN4a9oHzSio8bh5UH99GyKaJHskF2yTyJyTE7JBWmSFuHkmbyTD/JZeQ3mgpVgbTIaVKY7W+RHBdtfDzm57Q==</latexit>

kpmodel (y|Augment(x); ✓) pmodel (y|Augment(x); ✓)k22


<latexit sha1_base64="lbRU9dCGjr2VDCRF1/oeB7rbkbk=">AAACYHiclVC7bhpBFB02TkzIgyXp4mYUFAmKoF0SOZbcOHbjkkjmIbFkNTt7gZFnH5q5Gxkt+4H+BLcu3Lp1uswCRYA0OdJIZ865L50glUKj49xVrGcHz18cVl/WXr1+87ZuN94NdJIpDn2eyESNAqZBihj6KFDCKFXAokDCMLi+KP3hL1BaJPEVLlKYRGwWi6ngDI3k29xbpr6HcIN5lIQgi9ZiSdf/79ksghiL1k371EhzQNamn+n/lHvLn12/69tNp+OsQPeJuyFNskHPtx+8MOFZOY5LpvXYdVKc5Eyh4BKKmpdpSBm/ZjMYGxqzCPQkX4VR0E9GCek0UebFSFfq3x05i7ReRIGpjBjO9a5Xiv/yxhlOTya5iNMMIebrRdNMUkxomSwNhQKOcmEI40qYWymfM8U4mvy3toS6PK2omWDc3Rj2yaDbcY87X358bZ6dbyKqkiPykbSIS76RM3JJeqRPOLklj+SJ/K7cW1WrbjXWpVZl0/OebMH68Afv2boO</latexit>

<latexit sha1_base64="rPMsXiirYr4efHceLFdQNiIuKBU=">AAACR3icbZC7SgNBFIZn4z3eopY2g0HQJuwqqKVoI1YKRoUkhLOzJ2ZwdmaZOauGJe/ka/gCgpVWtnZi6SSm8HZg4Oc/1/niTElHYfgUlMbGJyanpmfKs3PzC4uVpeVzZ3IrsC6MMvYyBodKaqyTJIWXmUVIY4UX8fXhIH9xg9ZJo8+ol2ErhSstO1IAeatdOW4S3lEB3JERXXAkBScL2nWMTYc1fIO6yOnWcEKbOg6xuUEOFrk2xGWC2veA2uy3K9WwFg6D/xXRSFTZKE7alddmYkSe+glCgXONKMyoVYD1AxX2y83cYQbiGq6w4aWGFF2rGP65z9e9k3B/pX+a+ND93lFA6lwvjX2l/0fX/c4NzP9yjZw6e61C6iwn1OJrUSdXnDwAD5An0qIg1fMChJUDYB6cBeHx/NySuMFp/bIHE/3G8Fecb9Windr26VZ1/2CEaJqtsjW2wSK2y/bZETthdSbYPXtkz+wleAjegvfg46u0FIx6VtiPKAWfV96zMw==</latexit>

a stochastic transformation (the two terms above are not identical)


“Mean Teacher” ! replaces one of the terms above with the output of the model using an exponential moving average of model parameter values
<latexit sha1_base64="6dXJnSJ/j4/4EvYaJDikt3iiGTM=">AAAConicbVHbbtNAEF2bS0vKJcAjLyMCKk+RXRBF8FLBC0JCakXTVoqjdLyZxKuud63dcdrIyufwUfwA38HGCRJtGWmlozNn5ozO5pVWnpPkVxTfuXvv/tb2g87Ow0ePn3SfPjvxtnaSBtJq685y9KSVoQEr1nRWOcIy13SaX3xZ9U/n5Lyy5pgXFY1KnBk1VRI5UOPuz4zpipvz8++EBo4JZUFud3cJmVOzgtE5ewlrjaNKoyQP1hDYKXBBwORKD5jbOcGl4qIlbc1VzX8lpZ2QhtorM4NgQVdVmDesUIfWvGXDgThrd67FFTosKeyGOeqa/HLc7SX9pC24DdIN6IlNHY67v7OJlXUZjKRG74dpUvGoQcdKalp2stpThfIi2A4DNMHOj5o2ziW8DswEptaFZxha9t+JBkvvF2UelCVy4W/2VuT/esOapx9GjTIhHTJybTStNbCF1d/ARDmSrBcBoHQq3AqyCFnIEMV1l4lfnbbshGDSmzHcBid7/fR9/+3Ru97B501E2+KFeCneiFTsiwPxVRyKgZDRTpRGH6NP8av4W3wU/1hL42gz81xcqzj7Az470J8=</latexit> <latexit sha1_base64="2bfgzosQnhhaGNf9nGgUqu48QFQ=">AAACLHicbVDJSgNBFOxxN25Rj14ag+Apzqiox6AXjwpGhSSEnp43sbGXoftNMAz5DH/DH/Cqf+BFxGu+w85ycCtoKKreo15XnEnhMAzfg6npmdm5+YXF0tLyyupaeX3j2pnccqhzI429jZkDKTTUUaCE28wCU7GEm/j+bOjfdME6YfQV9jJoKdbRIhWcoZfa5b0mwgMWynSF7lDmR1kHqEmpMglImjHLFCBY2mUyB9dvlythNRyB/iXRhFTIBBft8qCZGJ4r0Mglc64RhRm2CmZRcAn9UjN3kDF+72Mbnmof51rF6GN9uuOVhKbG+qeRjtTvGwVTzvVU7CcVwzv32xuK/3mNHNOTViF0liNoPg5Kc0nR0GFLNBEWOMqeJ4xb4W+l/M53wX0VP1MSNzytX/LFRL9r+Euu96vRUfXg8rBSO51UtEC2yDbZJRE5JjVyTi5InXDySJ7JC3kNnoK34CP4HI9OBZOdTfIDweALYTSpYQ==</latexit>

“Virtual Adversarial Training” (VAT) ! computing an additive perturbation to apply to the input which maximally
<latexit sha1_base64="V6W6ffK9lJfPPU1woVBolUBeesI=">AAACrXicbZHNbtNAEMfX5quErwASFy4rItRyCTYg4MChBYE4FilJKyUhHa8n9qr7Ye2O20ZW3onX4QV4DtZODrRlpJX+mvnPzO5vs0pJT0nyO4pv3Lx1+87O3d69+w8ePuo/fjLxtnYCx8Iq644z8KikwTFJUnhcOQSdKTzKTr+09aMzdF5aM6JVhXMNhZFLKYBCatH/NSO8oObkZCId1aD4Qd7awcmgRw6kkabY3eV7k4PRqzWfOVmUBM7Zc77pFFZXNQUTB8MhzyXJM+QVhmku65ZwshyqSq1aQSVyaUIHPy+lKLmGC6lBhaIowRToO4etqbUIBd7zPFBwMqvbWeveoj9IhkkX/LpIt2LAtnG46P+Z5VbUGg1186ZpUtG8AUdSKFz3ZrXHCsQpFDgN0oBGP286smv+MmRyvrQuHBMu1Gb/7WhAe7/SWXBqoNJfrbXJ/9WmNS0/zpuOBBqxWbSsVYcofFN4tENBAUsuQbgAVbSAHAgKv3NpS+7bq3Vg0qsYrovJm2H6fvj2x7vB/uctoh32nL1geyxlH9g++84O2ZiJ6Fn0KfoafYtfx+N4Fv/cWONo2/OUXYq4+AsRYtWi</latexit>

maximally changes
changes the
the output
outputclass
classdistribution
distribution
<latexit sha1_base64="LHeBiECt8Hk+7Fq0naoeYo05R5Y=">AAACOnicbVDLSgMxFM34tr6qLt0Ei+CqzKioG0F047KCVaEtJZO57QSTzJDc0ZahX+Nv+ANudeXWhSBu/QAzbRe+DgQO596be+4JUyks+v6LNzE5NT0zOzdfWlhcWl4pr65d2iQzHOo8kYm5DpkFKTTUUaCE69QAU6GEq/DmtKhf3YKxItEX2E+hpVhXi47gDJ3ULh81EXqY38WCx1SxnlBMyj7lMdNdsBRjoEmGaYaUS2YtjZwlI8KsmB6U2uWKX/WHoH9JMCYVMkatXX5rRgnPFGgc/tcI/BRbOTMouIRBqZlZSBm/YV1oOKqZAtvKh2cO6JZTItpJjHvaGSrU7xM5U9b2Veg6FcPY/q4V4n+1Roadw1YutDsTNB8t6mSSYkKLzNzRBji6WCLBuBHOaxGQYRxdsj+2RLawNgwm+B3DX3K5Uw32q7vne5Xjk3FEc2SDbJJtEpADckzOSI3UCSf35JE8kWfvwXv13r2PUeuEN55ZJz/gfX4BOEOvVw==</latexit>

Entropy Minimization
<latexit sha1_base64="0aXAwrsok8vfH1RpFsYnfSGXvco=">AAACHHicbVDLSsNAFJ3UV62vqksXDhbBVUkqqMuiCG6ECvYBbSiTyaQdOpmEmYkQQ5b+hj/gVv/AnbgV/AG/w0mahW09MHA45965h+OEjEplmt9GaWl5ZXWtvF7Z2Nza3qnu7nVkEAlM2jhggeg5SBJGOWkrqhjphYIg32Gk60yuMr/7QISkAb9XcUhsH4049ShGSkvD6uEg/yMRxE2vuRJBGMNbyqlPH4uJmlk3c8BFYhWkBgq0htWfgRvgyCdcYYak7FtmqOwECUUxI2llEEkSIjxBI9LXlCOfSDvJQ6TwWCsu9AKhH1cwV/9uJMiXMvYdPekjNZbzXib+5/Uj5V3YCeVhpAjH00NexKAKYNYKdKkgWLFYE4QF1VkhHiOBsNLdzVxxZRYtrehirPkaFkmnUbfO6qd3jVrzsqioDA7AETgBFjgHTXADWqANMHgCL+AVvBnPxrvxYXxOR0tGsbMPZmB8/QIjDaMb</latexit>

encourage the model to output confident predictions on unlabeled data


<latexit sha1_base64="SvQdp9y3O5Gr+KnDFD0klktiiDA=">AAACRHicbVBNSyNBEO1x/dqsH3H3uJdmg7CnMKOgHkUP6zELGw0kIdR015jGnu6hu0YIIT/Jv+Ef8CSs1z3tTbyKPUkOa7IFBY9XX69eWmjlKY4fo5UPq2vrG5sfa5+2tnd263ufL70tncC2sNq6TgoetTLYJkUaO4VDyFONV+nNeVW/ukXnlTW/aFRgP4drozIlgAI1qP9AI8IuuEZOQ+S5lag5WW5LKkriwppMSTTEw1apRDXkuTW8NBpS1Ci5BAJeG9QbcTOeBl8GyRw02Dxag/qfnrSizMNuocH7bhIX1B+DIyU0Tmq90mMB4iYo6wZoIEffH08fnvD9wEieWRfSVCoD++/EGHLvR3kaOnOgoV+sVeT/at2SspP+WJnwezBmdigrp45U7nGpHArSowBAOBW0cjEEB4KCx++uSF9Jm1TGJIs2LIPLg2Zy1Dz8edA4PZtbtMm+sm/sO0vYMTtlF6zF2kywO/bAfrOn6D76Gz1HL7PWlWg+84W9i+j1DcoHsl4=</latexit>

<latexit sha1_base64="oSsE/gUUvbn/U2LrL96yM5sx7hg=">AAACRXicbVA9bxNBEN0zkBiHJAZKmhVOpKSx7hwJkGgsaFIGCSeWbMua25uLV9mP0+5c5OPwX+Jv8AdoKKBMlw7RwtpxQRJe9fRmRu/NSwslPcXx96jx4OGjjc3m49bWk+2d3fbTZ6felk7gQFhl3TAFj0oaHJAkhcPCIehU4Vl68X45P7tE56U1H6kqcKLh3MhcCqAgTdvHWhqp5SfkNEOOhpwtKm5zvldMx4RzqrXNUC0Oqs/zt3wclggO93huHS+NghQVZjwDgmm7E3fjFfh9kqxJh61xMm1fjTMrSh08hQLvR0lc0KQGR1IoXLTGpccCxAWc4yhQAxr9pF59vOD7QclWKXJriK/Ufy9q0N5XOg2bGmjm786W4v9mo5LyN5NamqIkNOLGKC8VJ8uX9fFMOhSkqkBAOBmycjEDB4JCybdcMr+MtmiFYpK7Ndwnp71u8qp79KHX6b9bV9RkL9hLdsAS9pr12TE7YQMm2Bf2jf1gP6Ov0XX0K/p9s9qI1jfP2S1Ef/4CjeOywA==</latexit>

minimize the entropy of pmodel (y|x; ✓) for unlabeled data


“Pseudo-Label” ! constructing hard (1-hot) labels from high-confidence predictions on unlabeled data and using these as training targets in a stand
<latexit sha1_base64="s3ALbb3wtTX6KuptTfcOVhvZ/UI=">AAACwHicbVHbbtNAEF2bWwm3AI+8jIhQ24dEcUHAY1VeeEAiSKQNSqJ0vR7bq653zc4YCFG+jq/gB/gO1kkeaMtIK509c+aiM2ltNPFw+DuKb9y8dfvO3t3OvfsPHj7qPn5ySq7xCsfKGecnqSQ02uKYNRuc1B5llRo8Sy/etfmzb+hJO/uZlzXOK1lYnWslOVCL7q8Z4w9enZ+PCJvM9T/IFM3+/hpmXhclS+/dd9hqlLPEvlGsbQGl9BkcJP3S8SGYtogg966CMpT1gzTXGVqFELbJtGqHETgLjd2IMYNMsgRpM2iobcglEoIkYC+13TDSF8gE2oIE4iBtZyrviPpo2bt6CSZ81otubzgYbgKug2QHemIXo0X3zyxzqqlCF2Uk0TQZ1jxfSc9aGVx3Zg1hLdWFLHAaoJUV0ny1MXsNLwKTQe58eJZhw/5bsZIV0bJKg7KSXNLVXEv+LzdtOH87X2lbNxyc2w7KGwPsoL0cZNqjYrMMQCqvw66gwhmk4nDfS1Myaldbd4IxyVUbroPTo0HyevDy06ve8cnOoj3xTDwXByIRb8SxeC9GYixUdBh9jCbRl/gkLmMXf91K42hX81RcivjnXyHj3Mw=</latexit>

predictions on unlabeled data and using these as training targets in a standard cross-entropy loss
<latexit sha1_base64="BUwu90obAZZ/Esml/yUbYxND+EM=">AAACZnicbVFNbxMxEHWWrxKgBBDiwGVEhMSFaLcg4FjBhWORSFspiaJZe5JY9dorzywiWuU3cuYPIP4AV8Cb5kBbRrL0/N48z+i5rJ1lyfPvveza9Rs3b+3d7t+5e2///uDBw2MOTdQ01sGFeFoik7OexmLF0WkdCavS0Ul59qHTT75QZBv8Z1nXNKtw6e3CapREzQd2KvRV2uQxVncUQ/DQeIclOTJgUBDQG2jY+iXIipgAGSSi9VsG45KEwXpAYEmtGA3oGJhfkpcY6jW4dNnMB8N8lG8LroJiB4ZqV0fzwY+pCbqp0ivaIfOkyGuZtRjFakeb/rRhqlGf4ZImCXqsiGftNpINPE+MgUWI6XiBLfuvo8WKeV2VqbNCWfFlrSP/p00aWbybtdbXjZDX54MWjQMJ0OULxkbS4tYJoI427Qp6hRG1pF+4MMVwt9qmn4IpLsdwFRwfjIo3o1efXg8P3+8i2lNP1TP1QhXqrTpUH9WRGiutvqlf6rf60/uZ7WePsyfnrVlv53mkLlQGfwFgWL08</latexit>

ta and using these as training targets in a standard cross-entropy loss


“Pseudo-Label” and “Sharpening” also achieve entropy minimization
<latexit sha1_base64="WcDaY8EU5wcGVRaIY40eocbhIhQ=">AAACQnicbVDLSiNBFK32GeMr6tJNYRDdGLoVZlyKIsxiFhGNCjGY29U3SWE9mqpqIYZ80fzG/IDLcfYu3IlbF1YnvfB1oOBwzn3ViVPBrQvDf8HE5NT0zGxprjy/sLi0XFlZPbc6MwwbTAttLmOwKLjChuNO4GVqEGQs8CK+Ocr9i1s0lmt15voptiR0Fe9wBs5L15XjdrtuMUv0zm+IUWxtUVAJbbdPe2BSVFx1c0lYTYH1ON4iReWMTvtUcsUlvyvmVMNaOAL9SqKCVEmB+nXl8SrRLJN+GBNgbTMKU9cagHGcCRyWrzKLKbAb6GLTUwUSbWsw+u6QbnoloR1t/FOOjtT3HQOQ1vZl7CsluJ797OXid14zc5391oCrNHOo2HhRJxPUaZpnRxNukDnR9wSY4f5WynxOwJxP+MOWxOanDcs+mOhzDF/J+W4t+lHbO9mtHhwWEZXIOtkg2yQiP8kB+UXqpEEY+UPuyQP5H/wNnoLn4GVcOhEUPWvkA4LXNziEsSg=</latexit>

Traditional Regularization
<latexit sha1_base64="93d7bR0k6r7eX8gIxrg92gWlJQk=">AAACInicbVDLSsNAFJ3Ud31VXboZLIKrklRQl0U3LlX6ENogN5ObdujkwcxEqCFf4G/4A271D9yJK8G13+GkzcLXgYHDOefOvRwvEVxp2363KnPzC4tLyyvV1bX1jc3a1nZXxalk2GGxiOW1BwoFj7CjuRZ4nUiE0BPY88Znhd+7Ral4HLX1JEE3hGHEA85AG+mmtj+Y/pFJ9PO2BJ8XMgh6hcNUgOR3Za5uN+wp6F/ilKROSlzc1D4HfszSECPNBCjVd+xEuxlIzZnAvDpIFSbAxjDEvqERhKjcbHpKTveN4tMgluZFmk7V7xMZhEpNQs8kQ9Aj9dsrxP+8fqqDEzfjUZJqjNhsUZAKqmNadEN9LpFpMTEEmDRVMMpGIIFp0+CPLb4qTsurphjndw1/SbfZcI4ah5fNeuu0rGiZ7JI9ckAcckxa5JxckA5h5J48kifybD1YL9ar9TaLVqxyZof8gPXxBSDvpbk=</latexit>

weight-decay and mixup


<latexit sha1_base64="IInwi2O4fp7WCpCkjop7R/upTV8=">AAACE3icbVDLSsNAFJ3UV62vqODGzWAR3FiSCuqy6MZlBfuANpTJ5KYdOpmEmYkaaj/DH3Crf+BO3PoB/oDf4fSxsK0HLhzOuZd7OH7CmdKO823llpZXVtfy64WNza3tHXt3r67iVFKo0ZjHsukTBZwJqGmmOTQTCSTyOTT8/vXIb9yDVCwWdzpLwItIV7CQUaKN1LEPHoB1e/o0AEoyTESAI/aYJh276JScMfAicaekiKaoduyfdhDTNAKhKSdKtVwn0d6ASM0oh2GhnSpICO2TLrQMFSQC5Q3G+Yf42CgBDmNpRmg8Vv9eDEikVBb5ZjMiuqfmvZH4n9dKdXjpDZhIUg2CTh6FKcc6xqMycMAkUM0zQwiVzGTFtEckodpUNvMlUKNow4Ipxp2vYZHUyyX3vHR2Wy5WrqYV5dEhOkInyEUXqIJuUBXVEEVP6AW9ojfr2Xq3PqzPyWrOmt7soxlYX7+DJZ5r</latexit>

(x1 , p1 ), (x2 , p2 ) ! pair of two examples with their corresponding labels


<latexit sha1_base64="jQqsQZEng6LGZTZSVeiHTvZh+3A=">AAACW3icbZDPbtNAEMY3Ln9KKBCKOHEZESG1UhXZAZUeK7hwLBJpKyXBWq/H8arr3dXuuElk5e36Ej1w5cAVHoB1mgNtGWmkn75vRjP6Mqukpzi+7kRbDx4+erz9pPt059nzF72Xu6fe1E7gSBhl3HnGPSqpcUSSFJ5bh7zKFJ5lF59b/+wSnZdGf6OlxWnFZ1oWUnAKUtr7vrdIkwOwabJ/AIGHLQ/3YeLkrCTunJnDhHBBjeXSgSmA5gZwwSur0MNcUglUYrCEcQ69NTqXegaKZ6j8Ku3140G8LrgPyQb6bFMnae/nJDeirlCTUNz7cRJbmjbckRQKV91J7dFyccFnOA6oeYV+2qxzWMG7oORQGBdaE6zVfzcaXnm/rLIwWXEq/V2vFf/njWsqjqaN1LYm1OLmUFErIANtqJBLh4LUMgAXToZfQZTccUEh+ltXct++tuqGYJK7MdyH0+EgORy8//qhf/xpE9E2e8Pesj2WsI/smH1hJ2zEBLtiv9hv9qfzI9qKutHOzWjU2ey8Yrcqev0X9A62WA==</latexit>

0
<latexit sha1_base64="S5BUfE/+tuxJ1McUG1d8zC4h6k8=">AAACN3icbVDLSgMxFM34tr6qLt0Eq+iqzKioCILoxqWCVaFTyp1M2oZmMkNyp7YM/Rd/wx9wq1tX7sStf2DazsKqBwKHcx/n5gSJFAZd982ZmJyanpmdmy8sLC4trxRX125NnGrGKyyWsb4PwHApFK+gQMnvE80hCiS/C9oXg/pdh2sjYnWDvYTXImgq0RAM0Er14omvRbOFoHX8QH3kXcw6oISUQCPRTROaGm7oli/tyhB26CnN6Va/Xiy5ZXcI+pd4OSmRHFf14ocfxiyNuEImwZiq5yZYy0CjYJL3C771SoC1ocmrliqIuKllwz/26bZVQtqItX0K6VD9OZFBZEwvCmxnBNgyv2sD8b9aNcXGcS0TKkmRKzYyaqSSYkwHgdFQaM5Q9iwBpoW9lbIWaGBoYx1zCc3gtH7BBuP9juEvud0re4fl/euD0tl5HtEc2SCbZJd45IickUtyRSqEkUfyTF7Iq/PkvDsfzueodcLJZ9bJGJyvb9ldrPI=</latexit>

! vanilla mixup uses = Loss Function


<latexit sha1_base64="Ms/nIKno+dIMKCowCo1I2BqIcng=">AAACFXicbVDNSgMxGMzWv1r/ql4EL8EieCq7FdRjURAPHirYVmiXks1m29BssiRZoSzra/gCXvUNvIlXz76Az2F2uwfbOhAYZr4v8zFexKjStv1tlZaWV1bXyuuVjc2t7Z3q7l5HiVhi0saCCfngIUUY5aStqWbkIZIEhR4jXW98lfndRyIVFfxeTyLihmjIaUAx0kYaVA/6+R+JJH56K5SC1zHHU6tm1+0ccJE4BamBAq1B9afvCxyHhGvMkFI9x460myCpKWYkrfRjRSKEx2hIeoZyFBLlJnl6Co+N4sNASPO4hrn6dyNBoVKT0DOTIdIjNe9l4n9eL9bBhZtQHsWacDwNCmIGtYBZHdCnkmDNJoYgLKm5FeIRkghrU9pMiq+y09KKKcaZr2GRdBp156x+eteoNS+LisrgEByBE+CAc9AEN6AF2gCDJ/ACXsGb9Wy9Wx/W53S0ZBU7+2AG1tcvBBGf0w==</latexit>

x0 closer to x1 than x2
<latexit sha1_base64="OOEM7qGU+RzIGkbhV4qcsVsdRtA=">AAACHHicbVDLSsNAFJ3UV62vqEsXDjaiq5JUUJdFNy4r2Ae0IUwm03boZBJmJtISuvQ3/AG3+gfuxK3gD/gdTtosbOuBC4dz7uXee/yYUals+9sorKyurW8UN0tb2zu7e+b+QVNGicCkgSMWibaPJGGUk4aiipF2LAgKfUZa/vA281uPREga8Qc1jokboj6nPYqR0pJnHlujMwtiFkkioIqgNfIcC6oB4hmtWp5Ztiv2FHCZODkpgxx1z/zpBhFOQsIVZkjKjmPHyk2RUBQzMil1E0lihIeoTzqachQS6abTRybwVCsB7EVCF1dwqv6dSFEo5Tj0dWeI1EAuepn4n9dJVO/aTSmPE0U4ni3qJSz7OEsFBlQQrNhYE4QF1bdCPEACYaWzm9sSyOy0SUkH4yzGsEya1YpzWbm4r5ZrN3lERXAETsA5cMAVqIE7UAcNgMETeAGv4M14Nt6ND+Nz1low8plDMAfj6xdnlJ+i</latexit>

MixMatch
<latexit sha1_base64="x1xnCbbiDjzLbxwhS5goN4KMDBg=">AAACEHicbVDLSsNAFJ34rPUV7dLNYBFclaSCuiy6cVOoYB/QhjKZTNqhk0mYmYgh5Cf8Abf6B+7ErX/gD/gdTtIsbOuBgcM59865HDdiVCrL+jbW1jc2t7YrO9Xdvf2DQ/PouCfDWGDSxSELxcBFkjDKSVdRxcggEgQFLiN9d3ab+/1HIiQN+YNKIuIEaMKpTzFSWhqbtVHxRyqIl7XpUxspPB2bdathFYCrxC5JHZTojM2fkRfiOCBcYYakHNpWpJwUCUUxI1l1FEsSITxDEzLUlKOASCctgjN4phUP+qHQjytYqH83UhRImQSungyQmsplLxf/84ax8q+dlPIoVoTjeZAfM6hCmDcBPSoIVizRBGFB9a0QT5FAWOm+FlI8mZ+WVXUx9nINq6TXbNiXjYv7Zr11U1ZUASfgFJwDG1yBFrgDHdAFGCTgBbyCN+PZeDc+jM/56JpR7tTAAoyvXw/KnbU=</latexit>

<latexit sha1_base64="a/0ULtWisjZicns+6kscEiAmNMs=">AAACWHicbZDLbtNAFIZPDPSSQpuWJZsRUdVUKpYNiLJBqoAFy6ImbaU4SseTk2SU8Yw1cwxElh+Ox4AHgC28AePUC3o50ki//nOdL82VdBRFP1rBg4eP1tY3Nttbj59s73R2986dKazAgTDK2MuUO1RS44AkKbzMLfIsVXiRLj7U+YsvaJ00uk/LHEcZn2k5lYKTt8adYaJkNi77LLFyNidurfnKooolhN+oPJtzm6OuevkR6x+yd439UVouWE+GGB6xqyuj8cXc0MHBIZv4i61Mi3p4Ne50ozBaBbsr4kZ0oYnTcedXMjGiyFCTUNy5YRzlNCq5JSkUVu2kcJhzseAzHHqpeYZuVK4gVGzfOxM2NdY/TWzl/t9R8sy5ZZb6yozT3N3O1eZ9uWFB07ejUuq8INTietG0UIwMq4n6P1sUpJZecGGlv5UJz40L8txvbJm4+rSq7cHEtzHcFecvw/hN+Orz6+7J+wbRBjyD59CDGI7hBD7BKQxAwHf4DX/gb+tnAMF6sHldGrSanqdwI4K9f4/gtPM=</latexit>

lim Sharpen(p, T ) = Dirac (i.e., “one-hot”) dis


<latexit sha1_base64="a/0ULtWisjZicns+6kscEiAmNMs=">AAACWHicbZDLbtNAFIZPDPSSQpuWJZsRUdVUKpYNiLJBqoAFy6ImbaU4SseTk2SU8Yw1cwxElh+Ox4AHgC28AePUC3o50ki//nOdL82VdBRFP1rBg4eP1tY3Nttbj59s73R2986dKazAgTDK2MuUO1RS44AkKbzMLfIsVXiRLj7U+YsvaJ00uk/LHEcZn2k5lYKTt8adYaJkNi77LLFyNidurfnKooolhN+oPJtzm6OuevkR6x+yd439UVouWE+GGB6xqyuj8cXc0MHBIZv4i61Mi3p4Ne50ozBaBbsr4kZ0oYnTcedXMjGiyFCTUNy5YRzlNCq5JSkUVu2kcJhzseAzHHqpeYZuVK4gVGzfOxM2NdY/TWzl/t9R8sy5ZZb6yozT3N3O1eZ9uWFB07ejUuq8INTietG0UIwMq4n6P1sUpJZecGGlv5UJz40L8txvbJm4+rSq7cHEtzHcFecvw/hN+Orz6+7J+wbRBjyD59CDGI7hBD7BKQxAwHf4DX/gb+tnAMF6sHldGrSanqdwI4K9f4/gtPM=</latexit>

T !0
lim Sharpen(p, T ) = Dirac (i.e., “one-hot”) distribution
T !0
Berthelot, David, et al. "Mixmatch: A holistic approach to semi-supervised learning." arXiv preprint arXiv:1905.02249 (2019).
Self-training with Noisy Student
improves ImageNet classi cation YouTube Video

mCE (mean corruption error) is


the weighted average of error
rate on different corruptions,
with AlexNet’s error rate as a
ImageNet-C and ImageNet-P test sets include images with baseline (lower is better).
common corruptions and perturbations such as blurring, mFR (mean flip rate) measures
fogging, rotation and scaling. ImageNet-A test set consists the model's probability of
of difficult images that cause significant drops in accuracy flipping predictions under
to state-of-the-art models. These test sets are considered as perturbations with AlexNet as
“robustness” benchmarks. a baseline (lower is better).

Xie, Qizhe, et al. "Self-training with noisy student improves imagenet classi cation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 2020.
fi
fi
FixMatch: Simplifying Semi-Supervised
Learning with Consistency and Con dence YouTube Video
<latexit sha1_base64="VngCmXqjYAz+t3ON0R4E1CaC6ss=">AAACv3icbVFNb9QwEJ2Ej5bla4ET4mKxQioHVkmRCuJU4MIBoaKy3ZV2V5XjOFurjh3ZzopVtD+0f6C/g5c0SLRlJNtv3pvxjGaySisfkuQiiu/cvXd/Z/fB4OGjx0+eDp89P/G2dkJOhNXWzTLupVZGToIKWs4qJ3mZaTnNzr+2+nQtnVfW/AqbSi5LvjKqUIIHUHbIaUGBJP3G3dAxUEmK3gHVVMFztIbvgXJi9B0vB2fAGVqB2UPkMfi39KnX110Wh/o3psarwWRQdP9TDj/gMNwWt0LdCnm2y2edUsLL2xz2Eh2wvp8CrIPG8asAsz0djpJx0hm7DdIejKi3o9Ph5SK3oi6lCUJz7+dpUoVlw11QQsvtYFF7WXFxzldyDmh4Kf2y6Wa9ZW/A5KywDscE1rH/ZjS89H5TZogseTjzN7WW/J82r0PxcdkoU9VBGnFVqKg1C5a1i2O5clIEvQHgwin0ysQZd1wErPdaldy3rW0HGEx6cwy3wcn+OD0Yv/+5Pzr80o9ol17Ra6w2pQ90SN/oiCYkor3oRzSNZvHneBWbuLoKjaM+5wVds3jzB8CMt0Q=</latexit>

Semi-Supervised Learning (SSL): Leveraging unlabeled data to improve a model’s performance


Extensions of the consistency regularization idea
<latexit sha1_base64="m0hF+4Gp2h+FbAElWWVhPIWiETY=">AAACOXicbVDLSgMxFM3UV62vqks3wSK4KjMV1IWLogguK9gHtKVkMnfa0ExmSDJiHfoz/oY/4FZ3Lt2IuPUHzEy7sK0XAodz7uPkuBFnStv2u5VbWl5ZXcuvFzY2t7Z3irt7DRXGkkKdhjyULZco4ExAXTPNoRVJIIHLoekOr1K9eQ9SsVDc6VEE3YD0BfMZJdpQveJFJ9uRSPDG1w8aRNqpcOhjPQBMDTYeQNARltCPOZHsMZvEzAPSK5bssp0VXgTOFJTQtGq94mfHC2kcgNCUE6Xajh3pbkKkZpTDuNCJFUSEDkkf2gYKEoDqJpnDMT4yjIf9UJonNM7YvxMJCZQaBa7pDIgeqHktJf/T2rH2z7sJE1Gc/nRyyI851iFOI8Mek0A1HxlAqGTGK6YDIgnVJtiZK55KrY0LJhhnPoZF0KiUndPyyW2lVL2cRpRHB+gQHSMHnaEqukE1VEcUPaEX9IrerGfrw/qyvietOWs6s49myvr5BQzmr08=</latexit>

– using an adversarial transformation in place of ↵


<latexit sha1_base64="Jj4KEgWs7x93JLeLQZu141pgENY=">AAACOHicbVDLSgMxFM34rPVVdekm2ApuWmYqqOCm6MZlBfuAtpQ7mUwbmskMSUYoQz/G3/AH3OrSnSvFrV9gZtqFbT0QOJx7LvfkuBFnStv2u7Wyura+sZnbym/v7O7tFw4OmyqMJaENEvJQtl1QlDNBG5ppTtuRpBC4nLbc0W06bz1SqVgoHvQ4or0ABoL5jIA2Ur9wXS7jWDExwCAweKkVJAOOtQSh/FAGmREzgSMOhOLQx6Uu8GgIpX6haFfsDHiZODNSRDPU+4XPrheSOKBCEw5KdRw70r0EpGaE00m+GysaARnBgHYMFRBQ1UuyT07wqVE8bBKZJzTO1L8bCQRKjQPXOE3moVqcpeJ/s06s/atewkQUayrI9JAfmwZCnDaGPSYp0XxsCBDJTFZMhiCBaFPW3BVPpdEmeVOMs1jDMmlWK85F5fy+WqzdzCrKoWN0gs6Qgy5RDd2hOmoggp7QC3pFb9az9WF9Wd9T64o12zlCc7B+fgFkcayr</latexit>

– using a running average or past model predictions for one invocation of pm


<latexit sha1_base64="fTru/LkN6H1lkyI4ZgYPVN6H4Nc=">AAACVXicbVDLbhNBEBwvJgRDEgNHLi1sJC6xdhMp5BjBhaOR8EOyLWt2ttcZZR6rmd4o1sq/xm8g7ogb/AESs2sfcEKfSlVd3aVKCyU9xfH3VvSo/fjgyeHTzrPnR8cn3Rcvx96WTuBIWGXdNOUelTQ4IkkKp4VDrlOFk/TmY61PbtF5ac0XWhe40HxlZC4Fp0Atu9PTUyi9NCvg4EpjGhQMfIVgHRTcE2iboYJwNpOidnnIg2QNgjS3dnsJbA79YjknvKNKb/qw7PbiQdwMPATJDvTYbobL7s95ZkWp0ZBQ3PtZEhe0qLgjKRRuOvPSY8HFTUg2C9BwjX5RNQ1s4G1gsiZWbg1Bw/7rqLj2fq3TsKk5Xfv7Wk3+T5uVlF8uKmmKktCI7aO8VEAW6johkw4FqXUAXDgZsoK45o4LCqXvfcl8HW3TCcUk92t4CMZng+RicP75rHf1YVfRIXvN3rB3LGHv2RX7xIZsxAT7yn6wX+x361vrT9SODrarUWvnecX2Jjr5C5kAtaE=</latexit>

– using a cross-entropy loss in place of the squared l2 loss


<latexit sha1_base64="6mQCxy9qndhg626lfYOGiTeoIcU=">AAACPXicbVBNTxsxEPUGWmj6ldJjLyOSSr0k2k0l4BjBhSOVyIcUQjTrnU2seO3F9iJFUf5O/wZ/gGsr8QPaU9UrV5yQAwk8ydKbNzN64xfnUlgXhndBaWv71eud3Tflt+/ef/hY+bTXsbownNpcS216MVqSQlHbCSeplxvCLJbUjScni373mowVWp27aU6DDEdKpIKj89Kw0qrXobBCjQCBG21tnZQzOp+C9AUIBblETqBTcGMCe1WgoQRq8rJZW44MK9WwES4Bz0m0IlW2wtmw8vci0bzIvA+XaG0/CnM3mKFxgkualy8KSznyCY6o76nCjOxgtvzpHL56JYFUG/+Ug6X6dGOGmbXTLPaTGbqx3ewtxJd6/cKlR4OZUHnhSPFHo7SQ4DQsYoNEGOJOTj1BboS/FfgYDXLnw11zSezitHnZBxNtxvCcdJqN6KDx/Uez2jpeRbTLvrB99o1F7JC12Ck7Y23G2U92y36x38FN8Cf4F/x/HC0Fq53PbA3B/QNP264E</latexit>

– using stronger forms of augmentation


<latexit sha1_base64="Re9iweHt9V9D9CJ/SRpRpil573Q=">AAACJHicbVDLTgIxFO3gC/GFunTTSIxuIDOYqEuiG5eYCJLAhHRKZ2joY9J2TMiET/A3/AG3+gfujAs3Lv0OOzALAU/S5OTc1+kJYka1cd0vp7Cyura+UdwsbW3v7O6V9w/aWiYKkxaWTKpOgDRhVJCWoYaRTqwI4gEjD8HoJqs/PBKlqRT3ZhwTn6NI0JBiZKzUL59WqzDRVERQGyVFRBQMpeIayhCiJOJEmLyz4tbcKeAy8XJSATma/fJPbyBxki3ADGnd9dzY+ClShmJGJqVeokmM8AhFpGupQJxoP51+aAJPrDLIjNgnDJyqfydSxLUe88B2cmSGerGWif/VuokJr/yUijgxRODZoTBh0EiYpQMHVBFs2NgShBW1XiEeIoWwsRnOXRnozNqkZIPxFmNYJu16zbuond/VK43rPKIiOALH4Ax44BI0wC1oghbA4Am8gFfw5jw7786H8zlrLTj5zCGYg/P9C5xipVA=</latexit>

Pseudo-labeling
<latexit sha1_base64="yA5HrAlxfzslA3Phj/6GWXStwF4=">AAACF3icbVDLSsNAFJ3UV62vqDvdDBbBjSWpoC6LblxWsA9oQ5lMbtqhkwczE6GEgr/hD7jVP3Anbl36A36HkzQL23rgwuGc++K4MWdSWda3UVpZXVvfKG9WtrZ3dvfM/YO2jBJBoUUjHomuSyRwFkJLMcWhGwsggcuh445vM7/zCEKyKHxQkxicgAxD5jNKlJYG5lE/35EK8KZNCYkXnXPiZuuGA7Nq1awceJnYBamiAs2B+dP3IpoEECrKiZQ924qVkxKhGOUwrfQTCTGhYzKEnqYhCUA6aX5/ik+14mE/ErpChXP170RKAikngas7A6JGctHLxP+8XqL8aydlYZwoCOnskJ9wrCKcBYI9JoAqPtGEUMH0r5iOiCBU6djmrngye21a0cHYizEsk3a9Zl/WLu7r1cZNEVEZHaMTdIZsdIUa6A41UQtR9IRe0Ct6M56Nd+PD+Jy1loxi5hDNwfj6BbWAoLs=</latexit>

predefined threshold
<latexit sha1_base64="2kzjvgl8PW5kQi/L5o9NJ2Pqzy0=">AAACEXicbVDLSsNAFJ34rPUVHzs3g0VwVZIK6rLoxmUF+4A2lMnkphk6mYSZiVBLv8IfcKt/4E7c+gX+gN/hpM3Cth64cDjnXO7l+ClnSjvOt7Wyura+sVnaKm/v7O7t2weHLZVkkkKTJjyRHZ8o4ExAUzPNoZNKILHPoe0Pb3O//QhSsUQ86FEKXkwGgoWMEm2kvn1s0gGEZjvAOpKgooQHfbviVJ0p8DJxC1JBBRp9+6cXJDSLQWjKiVJd10m1NyZSM8phUu5lClJCh2QAXUMFiUF54+n3E3xmlACHiTQjNJ6qfzfGJFZqFPsmGRMdqUUvF//zupkOr70xE2mmQdDZoTDjWCc4rwIHTALVfGQIoZKZXzGNiCRUm8LmrgQqf21SNsW4izUsk1at6l5WL+5rlfpNUVEJnaBTdI5cdIXq6A41UBNR9IRe0Ct6s56td+vD+pxFV6xi5wjNwfr6BZbGnfk=</latexit>

! one-hot (hard-label)
<latexit sha1_base64="miMLfLHzKjf0YAAxK9tSx/v7P4Y=">AAACJHicbZA7TgMxEIa9PEN4BShpLCJEKIh2AQFlBA1lkAhBSqLI650kFl57Zc8C0SpH4BpcgBZuQIcoaCg5B05IweuXLP36Z0Yz/sJECou+/+ZNTE5Nz8zm5vLzC4tLy4WV1QurU8OhxrXU5jJkFqRQUEOBEi4TAywOJdTDq5NhvX4NxgqtzrGfQCtmXSU6gjN0Ubuw1TSi20NmjL6hTYRbzLSCnZ5GWuoxE+1IFoLcHrQLRb/sj0T/mmBsimSsarvw0Yw0T2NQyCWzthH4CbYyZlBwCYN8M7WQMH7FutBwVrEYbCsbfWhAN10S0Y427imko/T7RMZia/tx6Dpjhj37uzYM/6s1UuwctTKhkhRB8a9FnVRS1HRIh0bCAEfZd4ZxI9ytlDsMjKNj+GNLZIenDfIOTPAbw19zsVsODsp7Z/vFyvEYUY6skw1SIgE5JBVySqqkRji5Iw/kkTx5996z9+K9frVOeOOZNfJD3vsnE0qliw==</latexit>

<latexit sha1_base64="i2qpu0ApcFJdV3ZYsy9cMusXoOc=">AAACYnicbVHLSiNBFK20z2lfUZfjojAICtp0R9BZyszGpYJRIR3C7arbSWF1VVNVrYSQL5wvcC+zd+tsrI5Z+LpQcDj33NeprJTCujh+bARz8wuLS8s/wpXVtfWN5ubWtdWVYdhhWmpzm4FFKRR2nHASb0uDUGQSb7K7P3X+5h6NFVpduVGJvQIGSuSCgfNUv4mp0kJxVI6iYr4rDNDSQnOU1DfigtU6S52mGVKpH4681OhyRPdFhNEhHYrB8IhplQvfhOFBmoZa0UpJyFAipxwchP1mK47iadCvIJmBFpnFRb/5L+WaVYUfxiRY203i0vXGYJxgEidhWlksgd35bbseKijQ9sZTOyZ0zzOc5tr45++asu8rxlBYOyoyryzADe3nXE1+l+tWLv/VGwtVVs7f+jYor2RtTu0t5cIgc3LkATAj/K6UDcEAc/4HPkzhtl5tUhuTfLbhK7huR8lJdHzZbp39nlm0TH6SXbJPEnJKzsg5uSAdwshf8kxeyP/GUxAGm8H2mzRozGq2yYcIdl4BlEK5qw==</latexit>

L ! number of classes
<latexit sha1_base64="K2YcH4DPipzgdJ7BQbSLG85xAKI=">AAACI3icbVA5TgMxFPWwE7YAJY1FBKKKZgABJYKGgiJIZJGSKPI4fxILjz2y/wDRKDfgGlyAFm5Ah2goaDkHzlIA4UmWnt77m1+YSGHR9z+8qemZ2bn5hcXc0vLK6lp+faNidWo4lLmW2tRCZkEKBWUUKKGWGGBxKKEa3pwP/OotGCu0usZeAs2YdZSIBGfopFZ+95I2jOh0kRmj72gD4R4zlcYhGKojyiWzFmy/lS/4RX8IOkmCMSmQMUqt/FejrXkag8LhjHrgJ9jMmEHBJfRzjdRCwvgN60DdUcVisM1s+J8+3XFKm0bauKeQDtWfHRmLre3FoauMGXbtX28g/ufVU4xOmplQSYqg+GhRlEqKmg7CoW1hgKPsOcK4Ee5WyrvMMI4uwl9b2nZwWj/nggn+xjBJKvvF4Kh4cHVYOD0bR7RAtsg22SMBOSan5IKUSJlw8kCeyDN58R69V+/Nex+VTnnjnk3yC97nN9jnpYI=</latexit>

encourages model predictions to be low-entropy (i.e., high-confidence)


X = {(xb , pb ) : b = 1, . . . , B} ! batch of B labeled examples
<latexit sha1_base64="jTwWyP2PgXVeUYpXmC8tkjxXLVA=">AAACXHicbZDBaxNBFMYna9WaWo0VvHh5mAoVQtjV0hZBKPHisYJpA9mwzMy+TYbO7iwzb23Csn+ef4QXj1686t3ZNAfb+sHAx/fe4735iVIrR2H4vRPc27r/4OH2o+7O490nT3vP9s6dqazEsTTa2IngDrUqcEyKNE5KizwXGi/E5ce2fvEVrVOm+EKrEmc5nxcqU5KTj5JeEuecFpLretLAB4jrg2UiBlAm4s17ED6JBhDr1JAbwChuILZqviBurbmCmHBJteAkF2Ay2B/tg+YCNaaAS56XGl2T9PrhMFwL7ppoY/pso7Ok9zNOjaxyLEhq7tw0Ckua1dySkhqbblw5LLm85HOcelvwHN2sXoNo4LVPUsiM9a8gWKf/TtQ8d26VC9/ZftvdrrXh/2rTirKTWa2KsiIs5PWirNJABlqqkCqLkvTKGy6t8reCXHDLJXn2N7akrj2t6Xow0W0Md83522F0NHz3+bB/Otog2mYv2St2wCJ2zE7ZJ3bGxkyyb+wX+83+dH4EW8FOsHvdGnQ2M8/ZDQUv/gLEYbVx</latexit>

on unlabeled data
FixMatch
<latexit sha1_base64="H/m3o+EUyXykbSJTZAxpMP5TnWc=">AAACEHicbVDNSsNAGNzUv1r/oj16WSyCp5JUUI9FQbwIFWwttKFsNpt26WYTdjdiCHkJX8CrvoE38eob+AI+h5s0B9s6sDDMfN/Ox7gRo1JZ1rdRWVldW9+obta2tnd298z9g54MY4FJF4csFH0XScIoJ11FFSP9SBAUuIw8uNOr3H94JELSkN+rJCJOgMac+hQjpaWRWR8Wf6SCeNk1fbpFCk9GZsNqWgXgMrFL0gAlOiPzZ+iFOA4IV5ghKQe2FSknRUJRzEhWG8aSRAhP0ZgMNOUoINJJi+AMHmvFg34o9OMKFurfjRQFUiaBqycDpCZy0cvF/7xBrPwLJ6U8ihXheBbkxwyqEOZNQI8KghVLNEFYUH0rxBMkEFa6r7kUT+anZTVdjL1YwzLptZr2WfP0rtVoX5YVVcEhOAInwAbnoA1uQAd0AQYJeAGv4M14Nt6ND+NzNloxyp06mIPx9QsEcZ2u</latexit>

xb ! training example
<latexit sha1_base64="5pgG+tp7SG4PwCgXo4H8ZmoSmP4=">AAACJHicbVC7TgJBFJ31Lb5QS5uJxGhFdtWopdHGUhMBEyDk7nCBibOzm5m7CtnwCf6GP2Crf2BnLGws/Q4HpFDwJJOcnHNfc8JESUu+/+FNTc/Mzs0vLOaWlldW1/LrG2Ubp0ZgScQqNjchWFRSY4kkKbxJDEIUKqyEt+cDv3KHxspYX1MvwXoEbS1bUgA5qZHf7TZCXjOy3SEwJr7nNcIuZWRAaqnbHLsQJQr7jXzBL/pD8EkSjEiBjXDZyH/VmrFII9QkFFhbDfyE6hkYksLNy9VSiwmIW2hj1VENEdp6NvxQn+84pclbsXFPEx+qvzsyiKztRaGrjIA6dtwbiP951ZRaJ/VM6iQl1OJnUStVnGI+SIc3pUFBqucICCPdrVx0wIAgl+GfLU07OK2fc8EE4zFMkvJ+MTgqHlwdFk7PRhEtsC22zfZYwI7ZKbtgl6zEBHtgT+yZvXiP3qv35r3/lE55o55N9gfe5zdVzqZV</latexit>

`s ! supervised loss
<latexit sha1_base64="HOQdjjOpUIomBao3oKBnlJXFjPs=">AAACJnicbVDLSgNBEJz1GeMr6tHLYBD0EnZV1GPQi0cFo0I2hNlJJxmc3Vmme6NhyTf4G/6AV/0DbyLePPkd7sYcNLFgoKjqpnoqiLVCct0PZ2p6ZnZuvrBQXFxaXlktra1foUmshJo02tibQCBoFUGNFGm4iS2IMNBwHdye5v51DywqE11SP4ZGKDqRaispKJOapV0ftG4i963qdElYa+64T3BPKSYx2J5CaHFtEAfNUtmtuEPwSeKNSJmNcN4sffktI5MQIpJaINY9N6ZGKiwpqWFQ9BOEWMhb0YF6RiMRAjbS4ZcGfDvJg9vGZi8iPlR/b6QiROyHQTYZCuriuJeL/3n1hNrHjVRFcUIQyZ+gdqI5GZ73w1vKgiTdz4iQVmW3ctkVVkjKWvyT0sL8tEExK8Ybr2GSXO1VvMPK/sVBuXoyqqjANtkW22EeO2JVdsbOWY1J9sCe2DN7cR6dV+fNef8ZnXJGOxvsD5zPb0hfp14=</latexit>

pb ! one-hot-labels
<latexit sha1_base64="CbobAbbQiLiItlmuEqE7RzMAJUg=">AAACInicbVDJSgNBEO1xN25Rj14ag+DFMKOiHoNePCqYBZIQejo1mcae7qG7Rg1DvsDf8Ae86h94E0+CZ7/DznJwe1DweK+KqnphKoVF33/3pqZnZufmFxYLS8srq2vF9Y2a1ZnhUOVaatMImQUpFFRRoIRGaoAloYR6eH029Os3YKzQ6gr7KbQT1lMiEpyhkzrFnbQT0pYRvRiZMfqWthDuMNcK9mKNe5KFIO2gUyz5ZX8E+pcEE1IiE1x0ip+truZZAgq5ZNY2Az/Fds4MCi5hUGhlFlLGr1kPmo4qloBt56N3BnTHKV0aaeNKIR2p3ydylljbT0LXmTCM7W9vKP7nNTOMTtq5UGmGoPh4UZRJipoOs6FdYYCj7DvCuBHuVspjZhhHl+CPLV07PG1QcMEEv2P4S2r75eCofHB5WKqcTiJaIFtkm+ySgByTCjknF6RKOLknj+SJPHsP3ov36r2NW6e8ycwm+QHv4wsuAKUr</latexit>

`u ! unsupervised loss
<latexit sha1_base64="wjETlS/KV7VX6eqZAn/gpkQGEPw=">AAACKHicbVDLSgMxFM34rPVVdekmWARBKDMq6lJ047KC1UKnlEx62wYzyZDcqGXoR/gb/oBb/QN34taF3+FM7cK2HggczrmXc3OiRAqLvv/pzczOzS8sFpaKyyura+uljc0bq53hUONaalOPmAUpFNRQoIR6YoDFkYTb6O4i92/vwVih1TX2E2jGrKtER3CGmdQq7YcgZcvR0IhuD5kx+oGGCI+YOmVdAuZeWGhTqa0dtEplv+IPQadJMCJlMkK1VfoO25q7GBRyyaxtBH6CzZQZFFzCoBg6Cwnjd6wLjYwqFoNtpsNPDeiuy4M72mRPIR2qfzdSFlvbj6NsMmbYs5NeLv7nNRx2TpupUIlDUPw3qOMkRU3zhmhbGOAo+xlh3IjsVsp7zDCOWY9jKW2bnzYoZsUEkzVMk5uDSnBcObw6Kp+djyoqkG2yQ/ZIQE7IGbkkVVIjnDyRF/JK3rxn79378D5/R2e80c4WGYP39QMjDKhX</latexit>

U = {ub : b = 1, . . . , µB} ! batch of µB unlabeled examples


<latexit sha1_base64="AjlhOcff3cLrxEh71eSmzrTOr2A=">AAACX3icbVBNaxRBEO0dPxJXjaOexEvhRvAgy4xKFEEI8eIxgpsEdpaluqdmt0nP9NBdE7MM8//8Cx69ePSqR3s2ezCJBU0/3ntFVT1ZG+05Sb4Pohs3b93e2r4zvHvv/s6D+OGjI28bp2iirLHuRKInoyuasGZDJ7UjLKWhY3n6sdePz8h5basvvKppVuKi0oVWyIGaxzIrkZcKTTvp4ANkbTOX70EGmL6EzOSWffjLBg6yDjKnF0tG5+xXyJjOuZXIagm2gN21ZxeayqAkQznQOZa1Id/N41EyTtYF10G6ASOxqcN5/DPLrWpKqlgZ9H6aJjXPWnSslaFumDWealSnuKBpgBWW5GftOosOngcmh8K68CqGNftvR4ul96tSBmd/ub+q9eT/tGnDxbtZq6u6YarUxaCiMcAW+mAh144Um1UAqJwOu4JaokPFIf5LU3Lfr9YNQzDp1Riug6NX43Rv/Przm9H+wSaibfFUPBMvRCrein3xSRyKiVDim/glfos/gx/RVrQTxRfWaLDpeSwuVfTkL5tNt1o=</latexit>

µ ! determines the relative size of X and U


<latexit sha1_base64="HYDDvqF93SPh4Ul3q/d+T+a9WRc=">AAACU3icbVC7bhNBFB1vQggmCQZKmlFspFTWLiCgjKChDBJ2LHkt6+7sXXuUeaxm7iYxq/0zfoMiLQUN/AENs46LvI400tG5rzMnK5X0FMdXnWhr+9HO490n3ad7+wfPes9fjL2tnMCRsMq6SQYelTQ4IkkKJ6VD0JnC0+zsc1s/PUfnpTXfaFXiTMPCyEIKoCDNe+NUVzx1crEkcM5e8JTwkuocCZ0OOz2nJXKHKvSfI/fyO3Jb8EGqgZYCVD1pBhxMfkMZNYNm3uvHw3gNfp8kG9JnG5zMe7/T3IpKoyGhwPtpEpc0q8GRFAqbblp5LEGcwQKngRrQ6Gf1+v8Nfx2UnBfWhWeIr9WbEzVo71c6C52tSX+31ooP1aYVFR9ntTRlRWjE9aGiUpwsb8PkuXQoSK0CAeFk8MrFEhyIkN7tK7lvrTXdEExyN4b7ZPxmmLwfvv36rn/8aRPRLnvFDtkRS9gHdsy+sBM2YoL9YL/YH/a387PzL4qi7evWqLOZecluIdr/D9cytdQ=</latexit>

pm (y|x) ! predicted class distribution produced by the model for input x


<latexit sha1_base64="1WUTpqlLx+8QfX6B+9r7GwMDh6k=">AAACYHicbVBNT9tAFNyYtqTpBwFu5bIiVKKXyKYVcERw4UglEpCSKFqvn5MVu15r9xliGf9AfgJXDly5wq1rJ4cSeKfRzPuYN2EqhUXfv294Kx8+flptfm59+frt+1p7faNvdWY49LiW2lyGzIIUCfRQoITL1ABToYSL8Oqk0i+uwVihk3PMUxgpNklELDhDR43bPB0PEWZYqHI3v539okMjJlNkxugbOlfcvkhwhIhyyaylkXNlRJhVC2hqdJRxp4U5xSlQpSOQNNaGiiTNkO7Mdspxu+N3/broWxAsQIcs6mzcfhxGmmcKEqxPDgI/xVHBDAouoWwNMwsp41dsAgMHE6bAjoo6jJL+dExUO4h1grRm/58omLI2V6HrVAyndlmryPe0QYbx4aio34KEzw/FmaSoaZWsy8UAR5k7wLgRzivlU2aYi868vhLZylrZcsEEyzG8Bf29brDf/f33T+foeBFRk2yRbbJLAnJAjsgpOSM9wskdeSLP5KXx4DW9NW993uo1FjOb5FV5P/4BdJG65w==</latexit>

H(p, q) ! cross-entropy between two probability distributions p and q


<latexit sha1_base64="UmDTDdeiuVlvPV4gCNA9AOahy70=">AAACV3icbZDPbhMxEMadhbYh/EvpkYtFgtRKEO1SBBwreumxSKStSKJo7J0kVr22a88SolXejdfoC/TavgF40xxoyydZ+vTNjGf0E06rQGl62UgePd7Y3Go+aT199vzFy/b2q5NgSy+xL622/kxAQK0M9kmRxjPnEQqh8VScH9b105/og7LmOy0cjgqYGjVREihG4/aPo133jl/s8aFX0xmB93bOh4S/qJLehvAeDXnrFlwgzRENp7nlzlsBQmlFC57HG70SZf1d4F3X5WBy3r3oLsftTtpLV+IPTbY2HbbW8bh9NcytLIu4UmoIYZCljkYVeFJS47I1LAM6kOcwxUG0BgoMo2rFYMnfxiTnE+vjM8RX6b8TFRQhLAoROwugWbhfq8P/1QYlTb6MKmVcSWjk7aJJqTlZXgONADxK0jUJkF7FW7mcgQdJEfudLXmoT1u2IpjsPoaH5uRDL/vU2//2sXPwdY2oyV6zN2yXZewzO2BH7Jj1mWS/2RW7ZjeNy8afZDNp3rYmjfXMDrujZPsvKtO23w==</latexit>

! pseudo-label
<latexit sha1_base64="rC/rKUQSRmAB29FP0uOj57iGBvg=">AAACHHicbVDLSgNBEJyN7/iKevTgYBC8GHZV1KPoxWME84AkhNnZTjJkdmeZ6VXDkqO/4Q941T/wJl4Ff8DvcPI4mMSChqKqm+4uP5bCoOt+O5m5+YXFpeWV7Ora+sZmbmu7bFSiOZS4kkpXfWZAighKKFBCNdbAQl9Cxe9eD/zKPWgjVHSHvRgaIWtHoiU4Qys1c3t1LdodZFqrB1pHeMQ0NpAE6kgyH2S/mcu7BXcIOku8McmTMYrN3E89UDwJIUIumTE1z42xkTKNgkvoZ+uJgZjxLmtDzdKIhWAa6fCRPj2wSkBbStuKkA7VvxMpC43phb7tDBl2zLQ3EP/zagm2LhqpiOIEIeKjRa1EUlR0kAoNhAaOsmcJ41rYWynvMM042uwmtgRmcFo/a4PxpmOYJeXjgndWOLk9zV9ejSNaJrtknxwSj5yTS3JDiqREOHkiL+SVvDnPzrvz4XyOWjPOeGaHTMD5+gX1kaMB</latexit>

A(·) ! strong augmentation (autoaugment/randaugment + cutout)


<latexit sha1_base64="j+knyZmNY8AkD2m6PUYWKYZWpMg=">AAACWXicbVDLThsxFHWmD0KgbYAlG6tRpaBK6QxFlCVtNyyDRAApE0V3PM7EwmOP7GtKNJqf619U7FG35QvwJLPg0SNZOjrnvnySQgqLYfinFbx6/ebtWnu9s7H57v2H7tb2udXOMD5iWmpzmYDlUig+QoGSXxaGQ55IfpFc/az9i2turNDqDBcFn+SQKTETDNBL024c54BzBrL8XvVjlmrco7ER2RzBGP2LxshvsLRotMoouCznCpettA8OdaN8MaDShtPPlHnH4V417fbCQbgEfUmihvRIg+G0exenmrl6DJNg7TgKC5yUYFAwyatO7CwvgF1BxseeKsi5nZTLFCr6ySspnWnjnz9jqT7uKCG3dpEnvrL+s33u1eL/vLHD2dGkFKpwyBVbLZo5SVHTOlKaCsMZyoUnwIzwt1I2BwMMffBPtqS2Pq3q+GCi5zG8JOf7g+hw8PX0oHf8o4moTXbJR9InEflGjskJGZIRYeQ3+Uv+kfvWbdAK2kFnVRq0mp4d8gTBzgOQO7hI</latexit>

↵(·) ! weak augmentation (flip and shift)


<latexit sha1_base64="nJg3OuQ/Cd14EOZv4BAUmu2PZ+c=">AAACe3icdVFNb9NAEF27fJTwFSi3XlZElVKEIpvPHgtcOBaJtJXiKBqvx/Yq611rd0yJLPM/uSP+BRLrNIemhZFWenpv3szobVor6SiKfgbhzq3bd+7u3hvcf/Dw0ePhk6enzjRW4FQYZex5Cg6V1DglSQrPa4tQpQrP0uWnXj/7htZJo7/SqsZ5BYWWuRRAnloMfxwkFVApQLUfunEiMkOHPLGyKAmsNRc8IfxOrSNrdMGhKSrUtPZ2gwRUXcL/TRcIyy0LH+dK1hx0xl0pczrsFsNRNInWxW+CeANGbFMni+HvJDOi6UcKBc7N4qimeQuWpFDob2oc1iCWUODMQw0Vunm7zqnjB57JeG6sf5r4mr3qaKFyblWlvrMPxV3XevJf2qyh/GjeSl03hFpcLsobxcnwPnSeSYuC1MoDEFb6W7kowYIg/zVbWzLXn9YNfDDx9RhugtNXk/jd5PWXN6Pjj5uIdtk+e87GLGbv2TH7zE7YlAn2KxgEe8Gz4E84Cl+ELy9bw2Dj2WNbFb79C2NKwxc=</latexit>

Consistency Regularization
<latexit sha1_base64="lGcpTrxly2mD6OOFbIy56UTToZQ=">AAACInicbVDLTgIxFO3gC/GFunTTSEhckRlM1CWRjUs08kiAkE7nAg2dzqTtmIwTvsDf8Afc6h+4M65MXPsddmAWAp6kycm55z563JAzpW37y8qtrW9sbuW3Czu7e/sHxcOjlgoiSaFJAx7IjksUcCagqZnm0AklEN/l0HYn9bTefgCpWCDudRxC3ycjwYaMEm2kQbHcm81IJHjTeiCU2QiCxvgORhEnkj1mvpJdsWfAq8TJSAllaAyKPz0voJEPQlNOlOo6dqj7CZGaUQ7TQi9SEBI6ISPoGiqID6qfzE6Z4rJRPDwMpHlC45n6tyMhvlKx7xqnT/RYLddS8b9aN9LDq37CRBiln5wvGkYc6wCn2WCPSaCax4YQKpm5FdMxkYRqk+DCFk+lp00LJhhnOYZV0qpWnIvK+W21VLvOIsqjE3SKzpCDLlEN3aAGaiKKntALekVv1rP1bn1Yn3Nrzsp6jtECrO9fRwel0A==</latexit>

! both ↵ and pm are stochastic functions, so the two terms in this equation will indeed have di↵erent values
<latexit sha1_base64="/gjf0Ur+Z8/XeuFTVkdA3ZN3+zM=">AAACjHicbVFdb9MwFHXCx0aBrcDjXizaSTygKoFpICGkCSTE45DoNqmpqhvnprHm2Jl901JF+Q37ffsD/APecdo+sI0rWT465375OK2UdBRFN0H44OGjxzu7T3pPnz3f2++/eHnmTG0FjoVRxl6k4FBJjWOSpPCisghlqvA8vfza6ecLtE4a/ZNWFU5LmGuZSwHkqVn/OrFyXhBYa5Y8IfxFTWqo4MMEVFXAkIPO+LCabaSy9YRF7siIAhxJwfNai66Ve8ud4VQgp6W/0ZaOS+0J6The1etxfCmV8myGmPECFsgzmedoURNfgKrRtbP+IBpF6+D3QbwFA7aN01n/d5IZUZe+h1Dg3CSOKpo2YP1yCtteUjusQFzCHCceaijRTZu1cS0/9EzGc2P98Tus2X8rGiidW5WpzyyBCndX68j/aZOa8o/TRuqqJtRiMyivFSdvjf8F/26LgtTKAxBWdkZ6Qy0Ib9ztKZnrVmt73pj4rg33wdm7UXw8ev/jaHDyZWvRLjtgr9kbFrMP7IR9Z6dszAT7ExwEw+Aw3AuPwk/h501qGGxrXrFbEX77C267yC4=</latexit>

Sohn, Kihyuk, et al. "Fixmatch: Simplifying semi-supervised learning with consistency and con dence." arXiv preprint arXiv:2001.07685 (2020).
fi
fi
Training Data-E cient Image Transformers
& Distillation Through Attention YouTube Video

DeiT: Data-Efficient Image Transformers


<latexit sha1_base64="encSOh291Z605DDzGxAyF5uK9JM=">AAACJHicbVDLSgMxFM3UV62vUZdugkV0Y5mpoOKqaAXdVegL2qFkMhkbmmSGJCOU0k/wN/wBt/oH7sSFG5d+h5l2FrZ6IHA493Fujh8zqrTjfFq5hcWl5ZX8amFtfWNzy97eaaookZg0cMQi2faRIowK0tBUM9KOJUHcZ6TlD67SeuuBSEUjUdfDmHgc3QsaUoy0kXr2YZXQ+gWsIo2Or0OjUyI0vDVdBNYlEiqMJDfzPbvolJwJ4F/iZqQIMtR69nc3iHDCzTrMkFId14m1N0JSU8zIuNBNFIkRHhijjqECcaK80eRDY3hglAAaa/PMORP198QIcaWG3DedHOm+mq+l4n+1TqLDc29ERZxoIvDUKEwY1BFM04EBlQRrNjQEYUnNrRD3kURYmwxmXAKVnjYumGDc+Rj+kma55J6WTu7KxcplFlEe7IF9cARccAYq4AbUQANg8AiewQt4tZ6sN+vd+pi25qxsZhfMwPr6AcXopNE=</latexit>

<latexit sha1_base64="I56gSfpm7wq++2xjx5UfI4O5+SI=">AAACM3icbVDLSiNBFK32MTrRGaOznE0xQdDFhG4D0WVQRxzBx4BJhBjC7eobU6S6uqm6LYaQP/E3/AG3+gMyOxF3/sNUHovxcaDgcM693FMnTJW05PsP3tT0zOynufnPuYXFL1+X8ssrNZtkRmBVJCoxZyFYVFJjlSQpPEsNQhwqrIfdnaFfv0RjZaJPqZdiM4YLLdtSADmplS//dgIeIfFdIODHWvX42lHCf10RGg1qLNtMdDhYfrB3+rPk+4frrXzBL/oj8PckmJACm+CklX8+jxKRxahJKLC2EfgpNftgSAqFg9x5ZjEF0XVhGo5qiNE2+6P/DfiqUyLeTox7mvhI/X+jD7G1vTh0kzFQx771huJHXiOj9lazL3WaEWoxPtTOFKeED8vikTQoyFUSSRBGuqxcdMCAcOW8vhLZYbRBzhUTvK3hPaltFINysfRno1DZnlQ0z76zH2yNBWyTVdg+O2FVJtg1u2V37N678f56j97TeHTKm+x8Y6/gvfwDI3mosw==</latexit>

ImageNet Data Only (No External Data such as JFT-300M)

Distillation through attention


<latexit sha1_base64="mispBklfBmYs9bCruSkmKpgpdZI=">AAACJnicbVDLSsNAFL2pr1pfVZduBougm5JUUJdFXbisYB/QhjKZTJqhkwczE6GEfIO/4Q+41T9wJ+LOld/hJO3Ctl4YOJxz7z13jhNzJpVpfhmlldW19Y3yZmVre2d3r7p/0JFRIghtk4hHoudgSTkLaVsxxWkvFhQHDqddZ3yT691HKiSLwgc1iakd4FHIPEaw0tSwejYodqSCutmttmOcFwpSvoiSkY+wUjSc9tbMulkUWgbWDNRgVq1h9WfgRiQJ9DjhWMq+ZcbKTrFQjHCaVQaJpDEmYzyifQ1DHFBpp8U5GTrRjIu8SOgXKlSwfydSHEg5CRzdGWDly0UtJ//T+onyruyUhXGi/0WmRl7CkYpQng9ymaBE8YkGmAimb0XExwITpVOcc3FlflpW0cFYizEsg06jbl3Uz+8bteb1LKIyHMExnIIFl9CEO2hBGwg8wQu8wpvxbLwbH8bntLVkzGYOYa6M71+HIKd9</latexit>

<latexit sha1_base64="IybaAzzAudkwjY2vkSQdiV6jIEo=">AAACPXicbVDLSgMxFM34rPVVdekmWAQFGWYqqLiqduNGUbE+qKVkMrdtMJMMyR2hlP6Ov+EPuFXwA3Qlbt2a1i58XQgczj33npsTpVJYDIJnb2R0bHxiMjeVn56ZnZsvLCyeW50ZDlWupTaXEbMghYIqCpRwmRpgSSThIrqp9PsXt2Cs0OoMOynUE9ZSoik4Q0c1CuUzYLwNhh7qGOQu3aMWjVYtKpwQKJfMWid3gjXwW/4GPYXWEeAVrWh168B6o1AM/GBQ9C8Ih6BIhnXcKLxex5pnCSgcbK+FQYr1LjMouIRe/jqzkDJ+4+xrDiqWgK13Bz/t0VXHxLSpjXsK6YD9PtFlibWdJHLKhGHb/u71yf96tQybO/WuUGmGoPiXUTOTFDXtx0ZjYYCj7DjAuBHuVsrbzDCOLtwfLrHtn9bLu2DC3zH8BeclP9zyN09KxfL+MKIcWSYrZI2EZJuUyQE5JlXCyR15II/kybv3Xrw37/1LOuINZ5bIj/I+PgGkka2b</latexit>

Teacher Model: A strong image classifier (e.g., RegNetY ConvNet)


<latexit sha1_base64="IybaAzzAudkwjY2vkSQdiV6jIEo=">AAACPXicbVDLSgMxFM34rPVVdekmWAQFGWYqqLiqduNGUbE+qKVkMrdtMJMMyR2hlP6Ov+EPuFXwA3Qlbt2a1i58XQgczj33npsTpVJYDIJnb2R0bHxiMjeVn56ZnZsvLCyeW50ZDlWupTaXEbMghYIqCpRwmRpgSSThIrqp9PsXt2Cs0OoMOynUE9ZSoik4Q0c1CuUzYLwNhh7qGOQu3aMWjVYtKpwQKJfMWid3gjXwW/4GPYXWEeAVrWh168B6o1AM/GBQ9C8Ih6BIhnXcKLxex5pnCSgcbK+FQYr1LjMouIRe/jqzkDJ+4+xrDiqWgK13Bz/t0VXHxLSpjXsK6YD9PtFlibWdJHLKhGHb/u71yf96tQybO/WuUGmGoPiXUTOTFDXtx0ZjYYCj7DjAuBHuVsrbzDCOLtwfLrHtn9bLu2DC3zH8BeclP9zyN09KxfL+MKIcWSYrZI2EZJuUyQE5JlXCyR15II/kybv3Xrw37/1LOuINZ5bIj/I+PgGkka2b</latexit>

Teacher Model: A strong image classifier (e.g., RegNetY ConvNet)


Soft Distillation
<latexit sha1_base64="ZPjcQFIA4CCmv3V2TMUuKoIafZ0=">AAACGXicbVC9TsMwGHTKXyl/BUYYLCokpiopEjBWwMBYBG2R2qhyHKe16tiR7SBVURZegxdghTdgQ6xMvADPgZNmoC0nWTrdfZ+/03kRo0rb9rdVWlpeWV0rr1c2Nre2d6q7ex0lYolJGwsm5IOHFGGUk7ammpGHSBIUeox0vfFV5ncfiVRU8Hs9iYgboiGnAcVIG2lQPeznfySS+OmdCDS8NjcpY4Vds+t2DrhInILUQIHWoPrT9wWOQ8I1ZkipnmNH2k2Q1BQzklb6sSIRwmM0JD1DOQqJcpM8QQqPjeLDQEjzuIa5+ncjQaFSk9AzkyHSIzXvZeJ/Xi/WwYWbUB7FmnA8PRTEDGoBs0qgTyXBmk0MQVhSkxXiEZIIa1PczBVfZdHSiinGma9hkXQadeesfnrbqDUvi4rK4AAcgRPggHPQBDegBdoAgyfwAl7Bm/VsvVsf1ud0tGQVO/tgBtbXL2E3oaA=</latexit>

! models trained with our transformer-specific distillation


<latexit sha1_base64="RMpaT6LKMuAV6OCAZgKkOBdZPRw=">AAACSnicbVBNSxtRFH0TU01j1WiX3TwahG4aZlS0y6AbN0KExghJCG/e3CQP38fw3h3TMORX+Tf8A9kq+AO6K934Zsyiai9cOJz7de6JUykchuEyqKxVP6xv1D7WNz9tbe80dveunMkshy430tjrmDmQQkMXBUq4Ti0wFUvoxTdnRb13C9YJo3/iPIWhYhMtxoIz9NSocTGwYjJFZq2Z0QHCL8yVSUA6ipb5nQmdCZxSf64gtBsbq8B+dynwYgtNvEQhZbltUR81mmErLIO+B9EKNMkqOqPG0yAxPFOgkUvmXD8KUxzmzKLgEhb1QeYgZfyGTaDvoWYK3DAv317Qfc8k1CvyqZGW7L8TOVPOzVXsOxXDqXtbK8j/1foZjn8Mc6HTDEHzl0PjTFI0tPDQP22Bo5x7wLgVXivlU2YZR+/0qyuJK6SVxkRvbXgPrg5a0XHr8PKo2T5dWVQjX8hX8o1E5IS0yTnpkC7h5I4syQN5DO6D38Gf4O9LayVYzXwmr6JSfQaV+7Vu</latexit>

Hard-label Distillation
<latexit sha1_base64="wmog2ozQiFkgesKiD8zN9eZ5FP8=">AAACH3icbVDLSsNAFJ3UV62vqks3Q4vgxpJUUJdFXXRZwT6gDWUyuWmHTh7MTIQSuvc3/AG3+gfuxG1/wO9wkmZhWw8MHM65d+7hOBFnUpnm3ChsbG5t7xR3S3v7B4dH5eOTjgxjQaFNQx6KnkMkcBZAWzHFoRcJIL7DoetM7lO/+wxCsjB4UtMIbJ+MAuYxSpSWhuXKIPsjEeDOmkS4l5w4wPGDvsw4z4eqZs3MgNeJlZMqytEaln8GbkhjHwJFOZGyb5mRshMiFKMcZqVBLCEidEJG0Nc0ID5IO8lyzPC5VlzshUK/QOFM/buREF/Kqe/oSZ+osVz1UvE/rx8r79ZOWBDFCgK6OOTFHKsQp8Vglwmgik81IVQwnRXTMRGEKl3f0hVXptFmJV2MtVrDOunUa9Z17eqxXm3c5RUV0RmqoAtkoRvUQE3UQm1E0Qt6Q+/ow3g1Po0v43sxWjDynVO0BGP+C89po+w=</latexit>

Fixing the positional encoding across resolutions


<latexit sha1_base64="CmM5heEvRIXqXUgTEv6Au7bT+Cw=">AAACOXicbZDNSgMxFIUz/lv/qi7dBIvgqswoqAsXoiAuK9hWaEvJZG7bYCYZcjNiGfoyvoYv4FZ3Lt2IuPUFzLRdWOuFwOGcm9ybL0ykQOv7b97M7Nz8wuLScmFldW19o7i5VUOdGg5VrqU2tyFDkEJB1Qor4TYxwOJQQj28u8jz+j0YFFrd2H4CrZh1legIzqyz2sXT5vCNzEA0uBQPQnWp7QFNNIq8gUkKiuso9xk3GpEaQC3TPMR2seSX/WHRaRGMRYmMq9IufjQjzdMYlOWSITYCP7GtjBkruIRBoZkiJIzfsS40nFQsBmxlww0HdM85Ee1o446ydOj+vpGxGLEfh64zZraHf7Pc/C9rpLZz0sqESlLrPjsa1EkltZrmyGgkDHAr+044Bo4Lp7zHDOPWgZ2YEmG+2qDgwAR/MUyL2kE5OCofXh+Uzs7HiJbIDtkl+yQgx+SMXJEKqRJOHskzeSGv3pP37n16X6PWGW98Z5tMlPf9Aw4kr1U=</latexit>

Use a lower training resolution and fine-tune the network at a larger resolution
<latexit sha1_base64="DYB4cfkStGw8y6alA9g/Tp/ah7M=">AAACV3icbVDLSgNBEJysrxhfUY9eBoPgxbCroB6DXjwqGBWTEHpnO8mQ2ZllplcJIf/mb/gDXvUPdDYGfDYMFNVdVE3FmZKOwvC5FMzNLywulZcrK6tr6xvVza0bZ3IrsCmMMvYuBodKamySJIV3mUVIY4W38fC82N8+oHXS6GsaZdhJoa9lTwogT3Wr921tpE5QE2865MCVeUTLyYLUUve5RWdUXtxy0AnveZsDyjVyGiDXSI/GDjlQIQTb98ovQbdaC+vhdPhfEM1Ajc3mslt9aSdG5KkPIxQ414rCjDpjsCSFwkmlnTvMQAyhjy0PNaToOuNpBxO+5xmfz1j//Gem7HfFGFLnRmnsL1Oggfu9K8j/dq2ceqedsdRZTqjFp1EvV5wMLwrlibQoSI08AGGlz8rFACwI8rX/cElcEW1S8cVEv2v4C24O69Fx/ejqsNY4m1VUZjtsl+2ziJ2wBrtgl6zJBHtiL+yVvZWeS+/BYlD+PA1KM802+zHB5gdF1beH</latexit>

a larger resolution ! softmax


<latexit sha1_base64="KkwhXNwOrl6xB0L95SsZG9+21o8=">AAACHHicbVDLSsNAFJ3UV62vqksXDhbBVUlU1GXRjcsK9gFNKJPJpB06yYSZG20JXfob/oBb/QN34lbwB/wOp4+FbT1w4XDOvdx7j58IrsG2v63c0vLK6lp+vbCxubW9U9zdq2uZKspqVAqpmj7RTPCY1YCDYM1EMRL5gjX83s3IbzwwpbmM72GQMC8inZiHnBIwUrt46CaaY1fxTheIUvIRu8D6kGkZQkT6w3axZJftMfAicaakhKaotos/biBpGrEYqCBatxw7AS8jCjgVbFhwU80SQnukw1qGxiRi2svGjwzxsVECHEplKgY8Vv9OZCTSehD5pjMi0NXz3kj8z2ulEF55GY+TFFhMJ4vCVGCQeJQKDrhiFMTAEEIVN7di2iWKUDDZzWwJ9Oi0YcEE48zHsEjqp2Xnonx2d16qXE8jyqMDdIROkIMuUQXdoiqqIYqe0At6RW/Ws/VufVifk9acNZ3ZRzOwvn4BBG2jDg==</latexit>

⌧ ! temperature
<latexit sha1_base64="DnsbErU1x04JEN1mEbJKm9PLs/o=">AAACIHicbVBLSgNBFOzxG+Mv6tJNYxBchRkVdRl04zKC+UAmhDedl6Sx50P3GzUMOYDX8AJu9QbuxKUewHPY+SzUWNBQVNXjva4gUdKQ6344c/MLi0vLuZX86tr6xmZha7tm4lQLrIpYxboRgEElI6ySJIWNRCOEgcJ6cHMx8uu3qI2Mo2saJNgKoRfJrhRAVmoXij5Byn0te30CreM77hPeU0YYJqiBUo1Dm3JL7hh8lnhTUmRTVNqFL78TizTEiIQCY5qem1ArA01SKBzm/dRgAuIGeti0NIIQTSsbf2bI963S4d1Y2xcRH6s/JzIIjRmEgU2GQH3z1xuJ/3nNlLpnrUxGSUoYicmibqo4xXzUDO9IjYLUwBIQWtpbueiDBkG2v19bOmZ02jBvi/H+1jBLaocl76R0dHVcLJ9PK8qxXbbHDpjHTlmZXbIKqzLBHtgTe2YvzqPz6rw575PonDOd2WG/4Hx+A3U/pOA=</latexit>

<latexit sha1_base64="sIyvJKDX/GCs7uXU7xAzTb4BqEQ=">AAACU3icbVBNT9tAEN0Y2tK0KQGOvayaVEovkQ1S2yOCC1IlBBL5kJIoWq8n8SrrtdkZI4Uo/4y/wYErh17gH3BhHXwopE8a6enNjObNCzOtkHz/ruJtbL57/2HrY/XT59qX7frObhfT3EroyFSnth8KBK0MdEiRhn5mQSShhl44Oy76vSuwqFJzQfMMRomYGjVRUpCTxvXu0KTKRGCI/wHIOMXAlZsBngmSMUd1DbhSUSTAm0OVOFuATd48bfIWwmUORgLXYKYU/+AyFmYKOK43/La/Al8nQUkarMTZuP53GKUyT5wRqQXiIPAzGi2EJSU1LKvDHCETcuacDRw1zgyOFqv/l/y7UyI+Sa0r98hK/XdjIRLEeRK6yURQjG97hfi/3iCnye/RQpksJ/fly6FJrjmlvAiTR8qCJD13REirnNciACskuchfXYmwsLasumCCtzGsk+5+O/jZPjjfbxwelRFtsa/sG2uxgP1ih+yEnbEOk+yG3bMH9li5rTx5nrf5MupVyp099gpe7RlpNbPo</latexit>

Keep the image patch sizes the same =) N (sequence length) changes
Zs , Zt ! student and teacher logits
<latexit sha1_base64="Z4vU5O1jbDhO/yl7c1/iCAOO4fs=">AAACM3icbVDLSgMxFM34tr6qLt0Ei+BCyoyKuhTduFSwKrZluJO5bYOZyZDcUcvQP/E3/AG3+gPiTsSd/2Bau/B1IHA493VyokxJS77/7I2Mjo1PTE5Nl2Zm5+YXyotLZ1bnRmBNaKXNRQQWlUyxRpIUXmQGIYkUnkdXh/36+TUaK3V6St0Mmwm0U9mSAshJYXnnMrQb/DIk3jCy3SEwRt/wBuEtFZbyGFPikMacEEQHDVe6Lcn2wnLFr/oD8L8kGJIKG+I4LL83Yi3yxO0TCqytB35GzQIMSaGwV2rkFjMQV9DGuqMpJGibxeB/Pb7mlJi3tHHP+Rmo3ycKSKztJpHrTIA69netL/5Xq+fU2msWMs1ywlR8HWrlipPm/bB4LA0KUl1HQBjpvHLRAQOCXKQ/rsS2b61XcsEEv2P4S842q8FOdetku7J/MIxoiq2wVbbOArbL9tkRO2Y1Jtgde2CP7Mm79168V+/tq3XEG84ssx/wPj4BhFOr9w==</latexit>

e =) N (sequence length) changes


<latexit sha1_base64="wDPDRFJyz+7rMEyZEDR9kdpuLA0=">AAACUHicbVDBThsxEJ1NSxtSaEM5crEaKsEl2qUScIzohWMQDSBlV9GsdxIsvLZleytFUT6sv9Ebp94Q/AG31pukUoGONPLTmxm/mZcbKZyP49uo8er12pu3zfXWu43N9x/aWx8vnK4spwHXUturHB1JoWjghZd0ZSxhmUu6zG++1vXL72Sd0OqbnxrKSpwoMRYcfaBG7fNUaaEKUp7tpqIMkuR2mSIqmNcMCzSeGe1E3Y2SkeK6EGri2F7liOWCVyGZUJ6s0XLx6f6o3Ym78SLYS5CsQAdW0R+179JC86oMW3CJzg2T2PhshtYLLmneSoOWQX6DExoGqLAkl80Wx8/Z58AUbKxtyHDFgv13Yoalc9MyD50l+mv3vFaT/6sNKz8+zmZCmcqHs5dC40rWvtROskJY4l5OA0Bug0Oc8Wu0yIMXT1UKV682bwVjkuc2vAQXB93ksPvl7KDTO1lZ1IQd+AR7kMAR9OAU+jAADj/gF9zDQ/Qzeox+N6Jl698XtuFJNFp/ANOBtVw=</latexit>

=) need to adapt positional encodings (use bicubic interpolation)


Touvron, Hugo, et al. "Training data-e cient image transformers & distillation through attention." International Conference on Machine Learning. PMLR, 2021.
ffi
ffi
Questions?
YouTube Playlist

You might also like