Zeynep Akata
München, Bayern, Deutschland
3410 Follower:innen
500+ Kontakte
Info
Researcher in Computer Vision and Machine Learning
Aktivitäten
-
I'm very happy to share that, last week, I successfully defended my PhD thesis titled "Trusting as a Moral Act: Trustworthy AI and Responsibility" at…
I'm very happy to share that, last week, I successfully defended my PhD thesis titled "Trusting as a Moral Act: Trustworthy AI and Responsibility" at…
Beliebt bei Zeynep Akata
-
🚀 First stop in Germany: 𝗘𝗟𝗜𝗔𝗦 𝗡𝗼𝗱𝗲 𝗠𝘂𝗻𝗶𝗰𝗵 ! 🌍 Building on Munich’s thriving AI ecosystem, the ELIAS Node Munich is a key driver of…
🚀 First stop in Germany: 𝗘𝗟𝗜𝗔𝗦 𝗡𝗼𝗱𝗲 𝗠𝘂𝗻𝗶𝗰𝗵 ! 🌍 Building on Munich’s thriving AI ecosystem, the ELIAS Node Munich is a key driver of…
Beliebt bei Zeynep Akata
Berufserfahrung
Ausbildung
-
College des Ecoles Doctorales (CED) de l’Universite de Grenoble
–
-
–
-
–
Veröffentlichungen
-
Generating Visual Explanations
European Conference of Computer Vision (ECCV)
Clearly explaining a rationale for a classification decision to an end user can be as important as the decision itself. Existing ap- proaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-language models can describe image content but fail to take into account class-discriminative image aspects which justify visual predictions. We propose a new model that focuses on the discriminating properties of the visible object, jointly…
Clearly explaining a rationale for a classification decision to an end user can be as important as the decision itself. Existing ap- proaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-language models can describe image content but fail to take into account class-discriminative image aspects which justify visual predictions. We propose a new model that focuses on the discriminating properties of the visible object, jointly predicts a class label, and explains why the predicted label is appropriate for the image. Through a novel loss function based on sampling and re- inforcement learning, our model learns to generate sentences that realize a global sentence property, such as class specificity. Our results on the CUB dataset show that our model is able to generate explanations which are not only consistent with an image but also more discriminative than descriptions produced by existing captioning methods.
Andere Autor:innen -
-
Generative Adversarial Text to Image Synthesis
International Conference of Machine Learning (ICML)
Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations. Meanwhile, deep convolutional generative adversarial networks (GANs) have begun to generate highly compelling images of specific cate- gories such as faces, album covers, room interiors etc. In this work,…
Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations. Meanwhile, deep convolutional generative adversarial networks (GANs) have begun to generate highly compelling images of specific cate- gories such as faces, album covers, room interiors etc. In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image model- ing, translating visual concepts from characters to pixels. We demonstrate the capability of our model to generate plausible images of birds and flowers from detailed text descriptions.
Andere Autor:innenVeröffentlichung anzeigen -
Latent Embeddings for Zero-shot Classification
IEEE CVPR
We present a novel latent embedding model for learning a compatibility function between image and class embeddings, in the context of zero-shot classification. The proposed method augments the state-of-the-art bilinear compatibility model by incorporating latent variables. Instead of learning a single bilinear map, it learns a collection of latent variable maps with the selection of which map
to use being a latent variable for the current image-class pair. We train the model with a ranking…We present a novel latent embedding model for learning a compatibility function between image and class embeddings, in the context of zero-shot classification. The proposed method augments the state-of-the-art bilinear compatibility model by incorporating latent variables. Instead of learning a single bilinear map, it learns a collection of latent variable maps with the selection of which map
to use being a latent variable for the current image-class pair. We train the model with a ranking based objective function which penalizes incorrect rankings of the true class for a given image. We empirically validate that our model improves the state-of-the-art for various class embeddings consistently on three challenging publicly available datasets for the zero-shot setting. Moreover, our
method leads to visually highly interpretable results with clear clusters of different fine-grained object properties that correspond to different latent variable maps.Andere Autor:innen -
Learning Deep Representations of Fine-Grained Visual Descriptions
IEEE CVPR
State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information. In these formulations the current best complement to visual features are attributes: manually-encoded vectors describing shared characteristics among
categories. Despite good performance, attributes have limitations including (1) finer-grained recognition requires commensurately more attributes; and (2) attributes do not provide a natural language…State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information. In these formulations the current best complement to visual features are attributes: manually-encoded vectors describing shared characteristics among
categories. Despite good performance, attributes have limitations including (1) finer-grained recognition requires commensurately more attributes; and (2) attributes do not provide a natural language interface. We propose to overcome these limitations by training neural language models from scratch; i.e. without pre-training and only consuming words and characters. Our proposed models train end-to-end to align with the fine-grained and category-specific content of images. Natural language provides a flexible and compact way of encoding only the salient visual aspects for distinguishing categories. By training on raw text our model can do inference on raw text as well, providing humans a familiar mode both for annotation and retrieval. Our model achieves strong performance on zero-shot text-based image retrieval and significantly outperforms the attribute-based state-of-the-art for zero-shot classification on the Caltech-UCSD birds dataset.Andere Autor:innen -
Multi-Cue Zero-Shot Learning with Strong Supervision
IEEE CVPR
Scaling up visual category recognition to large numbers of classes remains challenging. A promising research direction is zero-shot learning, which does not require any training data to recognize new classes, but rather relies on some form of auxiliary information describing the new classes. Ultimately, this may allow to use textbook knowledge that humans employ to learn about new classes by transferring knowledge from classes they know well. The most successful zero-shot learning approaches…
Scaling up visual category recognition to large numbers of classes remains challenging. A promising research direction is zero-shot learning, which does not require any training data to recognize new classes, but rather relies on some form of auxiliary information describing the new classes. Ultimately, this may allow to use textbook knowledge that humans employ to learn about new classes by transferring knowledge from classes they know well. The most successful zero-shot learning approaches currently require a particular type of auxiliary information – namely attribute an-
notations performed by humans – that is not readily available for most classes. Our goal is to circumvent this bottleneck by substituting such annotations by extracting multiple pieces of information from multiple unstructured text sources readily available on the web. To compensate for
the weaker form of auxiliary information, we incorporate stronger supervision in the form of semantic part annotations on the classes from which we transfer knowledge. We achieve our goal by a joint embedding framework that maps multiple text parts as well as multiple semantic parts into a
common space. Our results consistently and significantly improve on the SoA in zero-short recognition and retrieval.Andere Autor:innen -
Learning What and Where to Draw
Neural Information Processing Systems (NIPS)
Generative Adversarial Networks (GANs) have recently demonstrated the capability to synthesize compelling real-world images, such as room interiors, album covers, manga, faces, birds, and flowers. While existing models can synthesize images based on global constraints such as a class label or caption, they do not provide control over pose or object location. We propose a new model, the Generative Adversarial What-Where Network (GAWWN), that synthesizes images given instructions describing what…
Generative Adversarial Networks (GANs) have recently demonstrated the capability to synthesize compelling real-world images, such as room interiors, album covers, manga, faces, birds, and flowers. While existing models can synthesize images based on global constraints such as a class label or caption, they do not provide control over pose or object location. We propose a new model, the Generative Adversarial What-Where Network (GAWWN), that synthesizes images given instructions describing what content to draw in which location. We show high-quality 128 × 128 image synthesis on the Caltech-UCSD Birds dataset, conditioned on both informal text descriptions and also object location. Our system exposes control over both the bounding box around the bird and its constituent parts. By modeling the conditional distributions over part locations, our system also enables conditioning on arbitrary subsets of parts (e.g. only the beak and tail), yielding an efficient interface for picking part locations. We also show preliminary results on the more challenging domain of text- and location-controllable synthesis of images of human actions on the MPII Human Pose dataset.
Andere Autor:innen -
Evaluation of Output Embeddings for Fine-Grained Image Classification
IEEE Computer Vision and Pattern Recognition (CVPR)
Image classification has advanced significantly in recent years with the availability of large-scale image sets. However, fine-grained classification remains a major challenge due to the annotation cost of large numbers of fine-grained categories. This project shows that compelling classification performance can be achieved on such categories even without labeled training data. Given image and class embeddings, we learn a compatibility function such that matching embeddings are assigned a…
Image classification has advanced significantly in recent years with the availability of large-scale image sets. However, fine-grained classification remains a major challenge due to the annotation cost of large numbers of fine-grained categories. This project shows that compelling classification performance can be achieved on such categories even without labeled training data. Given image and class embeddings, we learn a compatibility function such that matching embeddings are assigned a higher score than mismatching ones; zero-shot classification of an image proceeds by finding the label yielding the highest joint compatibility score. We use state-of-the-art image features and focus on different supervised attributes and unsupervised output embeddings either derived from hierarchies or learned from unlabeled text corpora. We establish a substantially improved state-of-the-art on the Animals with Attributes and Caltech-UCSD Birds datasets. Most encouragingly, we demonstrate that purely unsupervised output embeddings (learned from Wikipedia and improved with fine-grained text) achieve compelling results, even outperforming the previous supervised state-of-the-art. By combining different output embeddings, we further improve results.
Andere Autor:innenVeröffentlichung anzeigen -
Good Practice in Large Scale Learning for Image Classification
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
We benchmark several SVM objective functions for large-scale image classification. We consider one-vs-rest, multi-class, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods in terms of classification accuracy, but with a significant gain in training speed. Using stochastic gradient descent, we can scale the training to millions of images and thousands of classes. Our…
We benchmark several SVM objective functions for large-scale image classification. We consider one-vs-rest, multi-class, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods in terms of classification accuracy, but with a significant gain in training speed. Using stochastic gradient descent, we can scale the training to millions of images and thousands of classes. Our experimental evaluation shows that ranking-based algorithms do not outperform the one-vs-rest strategy when a large number of training examples are used. Furthermore, the gap in accuracy between the different algorithms shrinks as the dimension of the features increases. We also show that learning through cross-validation the optimal rebalancing of positive and negative examples can result in a significant improvement for the one-vs-rest strategy. Finally, early stopping can be used as an effective regularization strategy when training with online algorithms. Following these "good practices", we were able to improve the state-of-the-art on a large subset of 10K classes and 9M images of ImageNet from 16.7% Top-1 accuracy to 19.1%.
Andere Autor:innenVeröffentlichung anzeigen -
Label Embedding for Attribute Based Classification
IEEE Computer Vision and Pattern Recognition Conference (CVPR)
Attributes are an intermediate representation, which enables parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label embedding problem: each class is embedded in the space of attribute vectors. We introduce a function which measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct…
Attributes are an intermediate representation, which enables parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label embedding problem: each class is embedded in the space of attribute vectors. We introduce a function which measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct classes rank higher than the incorrect ones. Results on the Animals With Attributes and Caltech-UCSD-Birds datasets show that the proposed framework outperforms the standard Direct Attribute Prediction baseline in a zero-shot learning scenario. The label embedding framework offers other advantages such as the ability to leverage alternative sources of information in addition to attributes (e.g. class hierarchies) or to transition smoothly from zero-shot learning to learning with large quantities of data.
Andere Autor:innenVeröffentlichung anzeigen -
Non-negative Matrix Factorization in Multimodality Data for Segmentation and Label Prediction
16th Computer Vision Winter Workshop, Mitterberg, Austria.
With the increasing availability of annotated multimedia data on the Internet, techniques are in demand that allow for a principled joint processing of different types of data. Multiview learning and multiview clustering attempt to identify latent components in different features spaces in a simultaneous manner. The resulting basis vectors or centroids faithfully represent the different views on the data but are implicitly coupled and they were jointly estimated. This opens new avenues to…
With the increasing availability of annotated multimedia data on the Internet, techniques are in demand that allow for a principled joint processing of different types of data. Multiview learning and multiview clustering attempt to identify latent components in different features spaces in a simultaneous manner. The resulting basis vectors or centroids faithfully represent the different views on the data but are implicitly coupled and they were jointly estimated. This opens new avenues to problems such as label prediction, image retrieval, or semantic grouping. In this paper, we present a new model for multiview clustering that extends traditional non-negative matrix factorization to the joint factorization of different data matrices. Accordingly, the technique provides a new approach to the joint treatment of image parts and attributes. First experiments in image segmentation and multiview clustering of image features and image labels show promising results and indicate that the proposed method offers a common framework for image analysis on different levels of abstraction.
Andere Autor:innenVeröffentlichung anzeigen
Sprachen
-
Turkish
Muttersprache oder zweisprachig
-
English
Verhandlungssicher
-
German
Gute Kenntnisse
-
French
Grundkenntnisse
Weitere Aktivitäten von Zeynep Akata
-
Nice demo of Patrick Pérez for Kyutai's latest prototype for real-time translation at the AI summit. Here, a shameless plug for Enhance Lab…
Nice demo of Patrick Pérez for Kyutai's latest prototype for real-time translation at the AI summit. Here, a shameless plug for Enhance Lab…
Beliebt bei Zeynep Akata
-
Exciting discussions on the future of AI at the Paris AI Action Summit with French Minister of Science Philippe Baptiste and many leading AI…
Exciting discussions on the future of AI at the Paris AI Action Summit with French Minister of Science Philippe Baptiste and many leading AI…
Beliebt bei Zeynep Akata
-
For PhD and MSc students interested in a research visit to Prague/VRG in 2025: we're open to hosting short-term collaborations or internships on a…
For PhD and MSc students interested in a research visit to Prague/VRG in 2025: we're open to hosting short-term collaborations or internships on a…
Beliebt bei Zeynep Akata
-
✨ We’re #hiring! Join Bernhard Schölkopf and me at the ELLIS Institute Tübingen to push the boundaries of #AI in #Education! We’re building…
✨ We’re #hiring! Join Bernhard Schölkopf and me at the ELLIS Institute Tübingen to push the boundaries of #AI in #Education! We’re building…
Beliebt bei Zeynep Akata
-
The previous year was a highlight! 🌟 📚 In October, I published the 2nd edition of my book on Generative AI (https://lnkd.in/ecqCsrZG). I added…
The previous year was a highlight! 🌟 📚 In October, I published the 2nd edition of my book on Generative AI (https://lnkd.in/ecqCsrZG). I added…
Beliebt bei Zeynep Akata
-
Today, February 10, is International Epilepsy Day. This is an annual awareness-raising initiative organized around the planet. Let me take this…
Today, February 10, is International Epilepsy Day. This is an annual awareness-raising initiative organized around the planet. Let me take this…
Beliebt bei Zeynep Akata
-
On Tyesday Feb 25, 15:00-16:00 at LAB42 (Science Park 904, Amsterdam, in room L1.01) I will give my retirement talk 'Celebrating Duality in…
On Tyesday Feb 25, 15:00-16:00 at LAB42 (Science Park 904, Amsterdam, in room L1.01) I will give my retirement talk 'Celebrating Duality in…
Beliebt bei Zeynep Akata
-
Together with Ingo Weber, I am excited to announce our new course DevOps: Engineering for Deployment and Operations launching this summer. We…
Together with Ingo Weber, I am excited to announce our new course DevOps: Engineering for Deployment and Operations launching this summer. We…
Beliebt bei Zeynep Akata
-
✨ The VIS Lab at the University of Amsterdam is proud and excited to announce it has #TWELVE papers 🚀 accepted for the leading #AI-#makers…
✨ The VIS Lab at the University of Amsterdam is proud and excited to announce it has #TWELVE papers 🚀 accepted for the leading #AI-#makers…
Beliebt bei Zeynep Akata
-
Lecturer / Senior Lecture (Assistant Professor) - Tenure Track (permanent position) in AI, including computer vision and machine learning, is now…
Lecturer / Senior Lecture (Assistant Professor) - Tenure Track (permanent position) in AI, including computer vision and machine learning, is now…
Beliebt bei Zeynep Akata
-
We have PhD internships open at Naver Labs Europe (Grenoble, France) on AI for robotics, in particular end-to-end trained navigation. We have…
We have PhD internships open at Naver Labs Europe (Grenoble, France) on AI for robotics, in particular end-to-end trained navigation. We have…
Beliebt bei Zeynep Akata
-
After more than six amazing years, I have decided to give my badge a well-deserved break and, along with it, my time at Meta is coming to an…
After more than six amazing years, I have decided to give my badge a well-deserved break and, along with it, my time at Meta is coming to an…
Beliebt bei Zeynep Akata