Marengo

Marengo

TwelveLabs
ModelScope

ModelScope

Alibaba Cloud
+
+

Related Products

  • 4K Video Downloader
    11,180 Ratings
    Visit Website
  • LALAL.AI
    4,694 Ratings
    Visit Website
  • Google AI Studio
    11 Ratings
    Visit Website
  • Picsart Enterprise
    26 Ratings
    Visit Website
  • Screencapt
    120 Ratings
    Visit Website
  • CLEAR
    1 Rating
    Visit Website
  • LTX
    141 Ratings
    Visit Website
  • AI Video Cut
    1 Rating
    Visit Website
  • Ango Hub
    15 Ratings
    Visit Website
  • Switcher Studio
    8 Ratings
    Visit Website

About

Marengo is a multimodal video foundation model that transforms video, audio, image, and text inputs into unified embeddings, enabling powerful “any-to-any” search, retrieval, classification, and analysis across vast video and multimedia libraries. It integrates visual frames (with spatial and temporal dynamics), audio (speech, ambient sound, music), and textual content (subtitles, overlays, metadata) to create a rich, multidimensional representation of each media item. With this embedding architecture, Marengo supports robust tasks such as search (text-to-video, image-to-video, video-to-audio, etc.), semantic content discovery, anomaly detection, hybrid search, clustering, and similarity-based recommendation. The latest versions introduce multi-vector embeddings, separating representations for appearance, motion, and audio/text features, which significantly improve precision and context awareness, especially for complex or long-form content.

About

This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported. This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported. The text-to-video generation diffusion model consists of three sub-networks: text feature extraction, text feature-to-video latent space diffusion model, and video latent space to video visual space. The overall model parameters are about 1.7 billion. Support English input. The diffusion model adopts the Unet3D structure, and realizes the function of video generation through the iterative denoising process from the pure Gaussian noise video.

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Audience

Media companies, AI researchers, and platforms searching for a tool to build smart search engines, content discovery tools, recommendation systems, or video-analysis workflows

Audience

Users interested in an open source text-to-video AI video generation model

Support

Phone Support
24/7 Live Support
Online

Support

Phone Support
24/7 Live Support
Online

API

Offers API

API

Offers API

Screenshots and Videos

Screenshots and Videos

Pricing

$0.042 per minute
Free Version
Free Trial

Pricing

Free
Free Version
Free Trial

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Training

Documentation
Webinars
Live Online
In Person

Training

Documentation
Webinars
Live Online
In Person

Company Information

TwelveLabs
Founded: 2021
United States
www.twelvelabs.io/product/models-overview#marengo

Company Information

Alibaba Cloud
China
modelscope.cn/

Alternatives

VideoPoet

VideoPoet

Google

Alternatives

Wan2.1

Wan2.1

Alibaba
Qwen3-VL

Qwen3-VL

Alibaba

Categories

Categories

Integrations

01.AI
CodeQwen
GLM-4.5
Qwen
Qwen-7B
Qwen-Image
Qwen2
Qwen2-VL
Qwen2.5
Qwen2.5-1M
Qwen2.5-Coder
Qwen2.5-Max
Qwen2.5-VL
Qwen3
Step 3.5 Flash
TwelveLabs
Yi-Large

Integrations

01.AI
CodeQwen
GLM-4.5
Qwen
Qwen-7B
Qwen-Image
Qwen2
Qwen2-VL
Qwen2.5
Qwen2.5-1M
Qwen2.5-Coder
Qwen2.5-Max
Qwen2.5-VL
Qwen3
Step 3.5 Flash
TwelveLabs
Yi-Large
Claim Marengo and update features and information
Claim Marengo and update features and information
Claim ModelScope and update features and information
Claim ModelScope and update features and information