Mohsen Fayyaz

Mohsen Fayyaz

Berlin, Berlin, Deutschland
4553 Follower:innen 500+ Kontakte

Aktivitäten

Anmelden, um alle Aktivitäten zu sehen

Berufserfahrung

  • Microsoft Grafik

    Microsoft

    Germany

  • -

    Germany

  • -

    Bonn, North Rhine-Westphalia, Germany

  • -

    United Kingdom

  • -

    United Kingdom

  • -

    Renningen, Baden-Württemberg, Germany

  • -

    Bonn, North Rhine-Westphalia, Germany

  • -

  • -

    Tehran, Tehran Province, Iran

  • -

    Tehran, Tehran Province, Iran

  • -

    Semnan, Semnan Province, Iran

  • -

  • -

    Semnan, Semnan Province, Iran

Ausbildung

  • Rheinische Friedrich-Wilhelms-Universität Bonn Grafik

    The University of Bonn

    Computer Vision Group
    Supervisor: Prof. Dr. J. Gall

  • Supervisor: Dr. K. Kiani.
    Thesis: Designing and Implementing a Cloud-based Accounting System
    Ranked First with highest GPA among all of the university computer software engineering students since the year 2010

  • Activities and Societies: Computer Vision and Robotics Team

    National Organization for Development of Exceptional Talent (NODET)

Bescheinigungen und Zertifikate

Ehrenamt

  • Admin

    Deeplearning.ir

    –Heute 9 Jahre 1 Monat

    Ausbildung

    Helping Iranian graduate students in learning machine learning and deep learning by answering questions in our Q-A forum in http://www.deeplearning.ir website.

Veröffentlichungen

  • Fast Weakly Supervised Action Segmentation Using Mutual Consistency

    IEEE Transactions on Pattern Analysis and Machine Intelligence, TPAMI

    Action segmentation is the task of predicting the actions in each frame of a video. Because of the high cost of preparing training videos with full supervision for action segmentation, weakly supervised approaches which are able to learn only from transcripts are very appealing. In this paper, we propose a new approach for weakly supervised action segmentation based on a two branch network. The two branches of our network predict two redundant but different representations for action…

    Action segmentation is the task of predicting the actions in each frame of a video. Because of the high cost of preparing training videos with full supervision for action segmentation, weakly supervised approaches which are able to learn only from transcripts are very appealing. In this paper, we propose a new approach for weakly supervised action segmentation based on a two branch network. The two branches of our network predict two redundant but different representations for action segmentation. During training we introduce a new mutual consistency loss (MuCon) that enforces that these two representations are consistent. Using MuCon and a transcript prediction loss, our network achieves state-of-the-art results for action segmentation and action alignment while being fully differentiable and faster to train since it does not require a costly alignment step during training.

    Andere Autor:innen
    Veröffentlichung anzeigen
  • AVID: Adversarial Visual Irregularity Detection

    ACCV 2018, Asian Conference on Computer Vision

    Real-time detection of irregularities in visual data is very invaluable and useful in many prospective applications including surveillance, patient monitoring systems, etc. With the surge of deep learning methods in the recent years, researchers have tried a wide spectrum of methods for different applications. However, for the case of irregularity or anomaly detection in videos, training an end-to-end model is still an open challenge, since often irregularity is not well-defined and there are…

    Real-time detection of irregularities in visual data is very invaluable and useful in many prospective applications including surveillance, patient monitoring systems, etc. With the surge of deep learning methods in the recent years, researchers have tried a wide spectrum of methods for different applications. However, for the case of irregularity or anomaly detection in videos, training an end-to-end model is still an open challenge, since often irregularity is not well-defined and there are not enough irregular samples to use during training. In this paper, inspired by the success of generative adversarial networks (GANs) for training deep models in unsupervised or self-supervised settings, we propose an end-to-end deep network for detection and fine localization of irregularities in videos (and images). Our proposed architecture is composed of two networks, which are trained in competing with each other while collaborating to find the irregularity. One network works as a pixel-level irregularity Inpainter, and the other works as a patch-level Detector. After an adversarial self-supervised training, in which I tries to fool D into accepting its inpainted output as regular (normal), the two networks collaborate to detect and fine-segment the irregularity in any given testing video. Our results on three different datasets show that our method can outperform the state-of-the-art and fine-segment the irregularity.

    Andere Autor:innen
    Veröffentlichung anzeigen
  • Spatio-Temporal Channel Correlation Networks for Action Classification

    ECCV 2018, European Conference on Computer Vision

    The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this…

    The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this block to the current state-of-the-art architectures such as ResNext and ResNet, we improved the performance by 2-3% on Kinetics dataset. Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D CNNs is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D CNNs is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by fine-tuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and fine-tuned on the target datasets, e.g. HMDB51/UCF101.

    Andere Autor:innen
    Veröffentlichung anzeigen
  • Temporal 3D ConvNets by Temporal Transition Layer

    CVPR 2018, IEEE Conference on Computer Vision and Pattern Recognition Workshops

    The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We embed this new temporal layer in our proposed 3D CNN. We extend the DenseNet architecture - which…

    The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We embed this new temporal layer in our proposed 3D CNN. We extend the DenseNet architecture - which normally is 2D - with 3D filters and pooling kernels. We name our proposed video convolutional network `Temporal 3D ConvNet'~(T3D) and its new temporal layer `Temporal Transition Layer'~(TTL). Our experiments show that T3D outperforms the current state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets.

    Andere Autor:innen
    Veröffentlichung anzeigen
  • Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes

    Computer Vision and Image Understanding

    The detection of abnormal behavior in crowded scenes has to deal with many challenges. This paper presents an efficient method for detection and localization of anomalies in videos. Using fully convolutional neural networks (FCNs) and temporal data, a pre-trained supervised FCN is transferred into an unsupervised FCN ensuring the detection of (global) anomalies in scenes. High performance in terms of speed and accuracy is achieved by investigating the cascaded detection as a result of reducing…

    The detection of abnormal behavior in crowded scenes has to deal with many challenges. This paper presents an efficient method for detection and localization of anomalies in videos. Using fully convolutional neural networks (FCNs) and temporal data, a pre-trained supervised FCN is transferred into an unsupervised FCN ensuring the detection of (global) anomalies in scenes. High performance in terms of speed and accuracy is achieved by investigating the cascaded detection as a result of reducing computation complexities. This FCN-based architecture addresses two main tasks, feature representation and cascaded outlier detection. Experimental results on two benchmarks suggest that the proposed method outperforms existing methods in terms of accuracy regarding detection and localization.

    Andere Autor:innen
    Veröffentlichung anzeigen
  • Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet

    arXiv preprint

    Major winning Convolutional Neural Networks (CNNs), such as VGGNet, ResNet, DenseNet, \etc, include tens to hundreds of millions of parameters, which impose considerable computation and memory overheads. This limits their practical usage in training and optimizing for real-world applications. On the contrary, light-weight architectures, such as SqueezeNet, are being proposed to address this issue. However, they mainly suffer from low accuracy, as they have compromised between the processing…

    Major winning Convolutional Neural Networks (CNNs), such as VGGNet, ResNet, DenseNet, \etc, include tens to hundreds of millions of parameters, which impose considerable computation and memory overheads. This limits their practical usage in training and optimizing for real-world applications. On the contrary, light-weight architectures, such as SqueezeNet, are being proposed to address this issue. However, they mainly suffer from low accuracy, as they have compromised between the processing power and efficiency. These inefficiencies mostly stem from following an ad-hoc designing procedure. In this work, we discuss and propose several crucial design principles for an efficient architecture design and elaborate intuitions concerning different aspects of the design procedure. Furthermore, we introduce a new layer called {\it SAF-pooling} to improve the generalization power of the network while keeping it simple by choosing best features. Based on such principles, we propose a simple architecture called {\it SimpNet}. We empirically show that SimpNet provides a good trade-off between the computation/memory efficiency and the accuracy solely based on these primitive but crucial principles. SimpNet outperforms the deeper and more complex architectures such as VGGNet, ResNet, WideResidualNet \etc, on several well-known benchmarks, while having 2 to 25 times fewer number of parameters and operations. We obtain state-of-the-art results (in terms of a balance between the accuracy and the number of involved parameters) on standard datasets, such as CIFAR10, CIFAR100, MNIST and SVHN.

    Andere Autor:innen
    Veröffentlichung anzeigen
  • Deep-cascade: Cascading 3D Deep Neural Networks for Fast Anomaly Detection and Localization in Crowded Scenes

    IEEE Transactions on Image Processing, TIP

    This paper proposes a fast and reliable method for anomaly detection and localization in video data showing crowded scenes. Time-efficient anomaly localization is an ongoing challenge and subject of this paper. We propose a cubic patch-based method, characterized by a cascade of classifiers, which makes use of an advanced feature learning approach. Our cascade of classifiers has two main stages. First, a light but deep 3D auto-encoder is used for early identification of “many” normal cubic…

    This paper proposes a fast and reliable method for anomaly detection and localization in video data showing crowded scenes. Time-efficient anomaly localization is an ongoing challenge and subject of this paper. We propose a cubic patch-based method, characterized by a cascade of classifiers, which makes use of an advanced feature learning approach. Our cascade of classifiers has two main stages. First, a light but deep 3D auto-encoder is used for early identification of “many” normal cubic patches. This deep network operates on small cubic patches as being the first stage, before carefully resizing remaining candidates of interest, and evaluating those at the second stage using a more complex and deeper 3D convolutional neural network (CNN). We divide the deep autoencoder and the CNN into multiple sub-stages which operate as cascaded classifiers. Shallow layers of the cascaded deep networks (designed as Gaussian classifiers, acting as weak single class classifiers) detect “simple” normal patches such as background patches, and more complex normal patches are detected at deeper layers. It is shown that the proposed novel technique (a cascade of two cascaded classifiers) performs comparable to current top-performing detection and localization methods on standard benchmarks, but outperforms those in general with respect to required computation time.

    Andere Autor:innen
    Veröffentlichung anzeigen
  • STFCN - Spatio-Temporal Fully Convolutional Neural Network for Semantic Segmentation of Street Scenes

    ACCV 2016, Asian Conference on Computer Vision Workshops

    This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenting video data. We propose a module based on a long short-term memory…

    This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenting video data. We propose a module based on a long short-term memory (LSTM) architecture of a recurrent neural network for interpreting the temporal characteristics of video frames over time. Our system takes as input frames of a video and produces a correspondingly-sized output; for segmenting the video our method combines the use of three components: First, the regional spatial features of frames are extracted using a CNN; then, using LSTM the temporal features are added; finally, by deconvolving the spatio-temporal features we produce pixel-wise predictions. Our key insight is to build spatio-temporal convolutional networks (spatio-temporal CNNs) that have an end-to-end architecture for semantic video segmentation. We adapted fully some known convolutional network architectures (such as FCN-AlexNet and FCN-VGG16), and dilated convolution into our spatio-temporal CNNs. Our spatio-temporal CNNs achieve state-of-the-art semantic segmentation, as demonstrated for the Camvid and NYUDv2 datasets.

    Andere Autor:innen
    Veröffentlichung anzeigen
  • A novel approach for Finger Vein verification based on self-taught learning

    9th Iranian Conference on Machine Vision and Image Processing (MVIP)

    In this paper, we propose a method for user Finger Vein Authentication (FVA) as a biometric system. Using the discriminative features for classifying theses finger veins is one of the main tips that make difference in related works, thus we propose to learn a set of representative features, based on auto-encoders. We model the represented users' finger vein structure using a Gaussian distribution. Experimental results show that our method performs like a state-of-the-art method on SDUMLA-HMT…

    In this paper, we propose a method for user Finger Vein Authentication (FVA) as a biometric system. Using the discriminative features for classifying theses finger veins is one of the main tips that make difference in related works, thus we propose to learn a set of representative features, based on auto-encoders. We model the represented users' finger vein structure using a Gaussian distribution. Experimental results show that our method performs like a state-of-the-art method on SDUMLA-HMT benchmark.

    Andere Autor:innen
    Veröffentlichung anzeigen
  • Online Signature Verification Based on Feature Representation

    International Symposium on Artificial Intelligence and Signal Processing

    Signature verification techniques employ various specifications of a signature. Feature extraction and feature selection have an enormous effect on accuracy of signature verification. Feature extraction is a difficult phase of signature verification systems due to different shapes of signatures and different situations of sampling. This paper presents a method based on feature learning, in which a sparse autoencoder tries to learn features of signatures. Then learned features have…

    Signature verification techniques employ various specifications of a signature. Feature extraction and feature selection have an enormous effect on accuracy of signature verification. Feature extraction is a difficult phase of signature verification systems due to different shapes of signatures and different situations of sampling. This paper presents a method based on feature learning, in which a sparse autoencoder tries to learn features of signatures. Then learned features have been
    employed to present users’ signatures. Finally, users’ signatures have been classified using one-class classifiers. The proposed method is signature shape independent thanks to learning features from users’ signatures using autoencoder. Verification process of proposed system is evaluated on SVC2004 signature database, which contains genuine and skilled forgery signatures. The experimental results indicate error reduction and accuracy enhancement.

    Andere Autor:innen
    Veröffentlichung anzeigen
Mitglied werden, um alle Veröffentlichungen anzuzeigen

Kurse

  • Digital Image Processing

    -

  • Evolutionary Computing

    -

  • Fuzzy Logic

    -

  • Knowledge Engineering

    -

  • Machine Learning

    -

  • Natural Language Processing

    -

  • Pattern Recognition

    -

  • Social Networks

    -

Auszeichnungen/Preise

  • Awarded member of Iran National Elites Foundation (Society of prominent students of the country)

    Iran National Elites Foundation

    Iran's National Elites Foundation (INEF) (Persian: بنياد ملي نخبگان‎‎) is an Iranian governmental organization founded on 31 May 2005 by approval of the Supreme Cultural Revolution Council of Iran. The main purpose of the foundation is to recognize, organize and support Iran's elite national talents. Members of the foundation include all who show exceptionally high intellectual capacity, academic aptitude, creative ability and artistic talents, specially contributors in promotion of global…

    Iran's National Elites Foundation (INEF) (Persian: بنياد ملي نخبگان‎‎) is an Iranian governmental organization founded on 31 May 2005 by approval of the Supreme Cultural Revolution Council of Iran. The main purpose of the foundation is to recognize, organize and support Iran's elite national talents. Members of the foundation include all who show exceptionally high intellectual capacity, academic aptitude, creative ability and artistic talents, specially contributors in promotion of global science and highly cited scientists and researchers. Iran National Elites Foundation (INEF) is a statewide organization and composed of members with significant scientific and executive background.

Sprachen

  • Persian

    Muttersprache oder zweisprachig

  • English

    Verhandlungssicher

Organisationen

  • Microsoft

    Research Intern

    –Heute
  • University of Bonn

    Doctoral Researcher

  • Bosch Center for Artificial Intelligence (BCAI)

    Research Intern

  • Sensifai

    -

Erhaltene Empfehlungen

2 Personen haben Mohsen Fayyaz empfohlen

Jetzt anmelden und ansehen

Weitere Aktivitäten von Mohsen Fayyaz

Mohsen Fayyaz’ vollständiges Profil ansehen

  • Herausfinden, welche gemeinsamen Kontakte Sie haben
  • Sich vorstellen lassen
  • Mohsen Fayyaz direkt kontaktieren
Mitglied werden. um das vollständige Profil zu sehen

Weitere ähnliche Profile

Weitere Mitglieder, die Mohsen Fayyaz heißen

Entwickeln Sie mit diesen Kursen neue Kenntnisse und Fähigkeiten