MarengoTwelveLabs
|
ModelScopeAlibaba Cloud
|
|||||
Related Products
|
||||||
About
Marengo is a multimodal video foundation model that transforms video, audio, image, and text inputs into unified embeddings, enabling powerful “any-to-any” search, retrieval, classification, and analysis across vast video and multimedia libraries. It integrates visual frames (with spatial and temporal dynamics), audio (speech, ambient sound, music), and textual content (subtitles, overlays, metadata) to create a rich, multidimensional representation of each media item. With this embedding architecture, Marengo supports robust tasks such as search (text-to-video, image-to-video, video-to-audio, etc.), semantic content discovery, anomaly detection, hybrid search, clustering, and similarity-based recommendation. The latest versions introduce multi-vector embeddings, separating representations for appearance, motion, and audio/text features, which significantly improve precision and context awareness, especially for complex or long-form content.
|
About
This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported.
This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported.
The text-to-video generation diffusion model consists of three sub-networks: text feature extraction, text feature-to-video latent space diffusion model, and video latent space to video visual space. The overall model parameters are about 1.7 billion. Support English input. The diffusion model adopts the Unet3D structure, and realizes the function of video generation through the iterative denoising process from the pure Gaussian noise video.
|
|||||
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
|||||
Audience
Media companies, AI researchers, and platforms searching for a tool to build smart search engines, content discovery tools, recommendation systems, or video-analysis workflows
|
Audience
Users interested in an open source text-to-video AI video generation model
|
|||||
Support
Phone Support
24/7 Live Support
Online
|
Support
Phone Support
24/7 Live Support
Online
|
|||||
API
Offers API
|
API
Offers API
|
|||||
Screenshots and Videos |
Screenshots and Videos |
|||||
Pricing
$0.042 per minute
Free Version
Free Trial
|
Pricing
Free
Free Version
Free Trial
|
|||||
Reviews/
|
Reviews/
|
|||||
Training
Documentation
Webinars
Live Online
In Person
|
Training
Documentation
Webinars
Live Online
In Person
|
|||||
Company InformationTwelveLabs
Founded: 2021
United States
www.twelvelabs.io/product/models-overview#marengo
|
Company InformationAlibaba Cloud
China
modelscope.cn/
|
|||||
Alternatives |
Alternatives |
|||||
|
|
||||||
|
|
||||||
|
|
|
|||||
Categories |
Categories |
|||||
Integrations
01.AI
CodeQwen
GLM-4.5
Qwen
Qwen-7B
Qwen-Image
Qwen2
Qwen2-VL
Qwen2.5
Qwen2.5-1M
|
Integrations
01.AI
CodeQwen
GLM-4.5
Qwen
Qwen-7B
Qwen-Image
Qwen2
Qwen2-VL
Qwen2.5
Qwen2.5-1M
|
|||||
|
|
|