Welcome back to Data Streaming Now! In this episode, Ali Alemi (Principal Data Streaming Architect at AWS) and Mindy Ferguson (VP of Technology at AWS) tackle one of the most persistent challenges facing teams running Apache Kafka in production: operating distributed streaming systems reliably in cloud environments. Running Kafka in the cloud sounds simple in theory—but in practice, cloud constraints create operational complexity that can make your streaming infrastructure unstable. Network throughput quotas limit how fast you can replicate data. Storage throughput caps restrict your write performance. Individual nodes fail unexpectedly. Entire Availability Zones can go down. And when any of these happen, traditional Kafka architectures force you into expensive, time-consuming data rebalancing operations that work against the elastic promise of cloud computing. The result? Operational burden that prevents teams from scaling at the speed their business demands. We'll dive deep into how Amazon MSK Express Brokers fundamentally solve these operational challenges by decoupling storage from compute, eliminating data rebalancing overhead, and providing true cloud-native elasticity for Kafka workloads—allowing you to scale seamlessly, recover from failures instantly, and operate with confidence even under the most demanding conditions. Join us live as we explore Amazon MSK Express Brokers, demonstrate resilience under failure conditions, and—yes—prove it works with real systems and real metrics. No theory. No slides. Just production-grade solutions to Kafka's toughest operational challenges.
AWS Databases & Analytics
IT Services and IT Consulting
Seattle, Washington 266,644 followers
Put your data to work on the most scalable, trusted, and secure cloud.
About us
At AWS, we believe the next wave of reinvention will be driven by data. The ideal data strategy isn’t one size fits all. It’s adapted for your needs. It gives you the best of both data lakes and purpose-built data stores. It lets you store any amount of data you need at a low cost, and in open, standards-based data formats. It isn’t restricted by data silos, and lets you empower people to run analytics or machine learning using their preferred tool or technique. And, it lets you securely manage who has access to the data. Choose from 15 databases, 12 analytics, and 30 machine learning services – more than you’ll find anywhere else - to help you get insights from your data.
- Website
-
https://aws.amazon.com/products/databases/
External link for AWS Databases & Analytics
- Industry
- IT Services and IT Consulting
- Company size
- 10,001+ employees
- Headquarters
- Seattle, Washington
Updates
-
💡 Your standby Regions shouldn't cost the same as your primary. Now they don't have to. 👉 https://go.aws/47WpdUP Amazon DocumentDB 5.0 global clusters now support serverless instances. Your secondary Regions can run at minimal capacity during normal operations & auto-scale only when traffic demands—or when a failover occurs. #AmazonDocumentDB What this changes for multi-Region builders: 🔹 Up to 10 secondary Regions, each scaling independently via DocumentDB Capacity Units (DCUs) 🔹 Failover promotion to full read/write capability in under one minute 🔹 No more pre-provisioning instances sized for worst-case scenarios across every Region This is especially relevant for #Serverless multi-Region architectures — SaaS platforms, multi-tenant workloads & apps with time-zone-driven usage peaks. Pair with AWS Lambda & Amazon API Gateway for a fully serverless stack. Honest tradeoff: If your secondary Regions handle consistent heavy read traffic, provisioned instances may still be more cost-effective. Serverless shines brightest when standbys are mostly idle. #Database Check out the demo walkthrough below. 👇
-
🧊 Whether you're a seasoned Apache Iceberg practitioner or just beginning your journey, join AWS experts at Iceberg Summit 2026. 👉 https://go.aws/4d4TYuj AWS sessions cover: 🎯 Fine-Grained Metadata Commits in Apache Iceberg: Improve concurrency & reduce commit conflicts in high-throughput environments 💠 A Rusty Future: Bringing Iceberg to the Rust Data Ecosystem: Integrate Iceberg tables with Rust-based data processing pipelines 📊 The Evolution of Semi-Structured Data: Moving from JSON Strings to Iceberg V3 Variants: Query nested data structures without schema flattening ⚡ Performance Tuning for Streaming Ingestion into Apache Iceberg: Reduce small file overhead & optimize compaction strategies for real-time workloads Register to attend sessions on Iceberg architecture & performance. #AWSAnalytics #Developer #AWSatApacheIcebergSummit
-
Amazon Kinesis Data Streams On-demand Advantage mode delivers 60%+ cost savings. Read the blog to learn more. 👉 https://go.aws/4dzttgD In this post, we walk through three real-world scenarios showing how On-demand Advantage reduces costs compared to On-demand Standard without sacrificing performance or flexibility. Reduce costs across consistent high-throughput workloads, extended data retention, & architectures with multiple Enhanced Fan-Out consumers — with savings ranging from 41% to 67% depending on the use case. On-demand Advantage is a good fit if your streaming workloads run consistently at scale, use multiple consumers, or need longer data retention. #AWS #DataStream #BigData
-
⚡Amazon Redshift increases performance of new queries by up to 7x. Try it today! 👉 https://go.aws/4lylT7Y Get faster query results from the first run. #AmazonRedshift now delivers accelerated query startup times for low-latency #SQLAnalytics workloads like near real-time analytics, BI dashboards, & agentic AI. Amazon Redshift has optimized its code generation engine, so new queries start faster and deliver performance consistent with subsequent runs. #CloudDataWarehouse
-
🎮 Read Yggdrasil Gaming’s journey from BigQuery to AWS that reduced their data processing costs by 60%. 👉 https://go.aws/47wMoVy Yggdrassil Gaming built an Apache Iceberg-based modern lakehouse on Amazon S3, Amazon Athena, & AWS Glue Data Catalog to solve advanced analytics & AI/ML use cases such as player behavior modeling, predictive game recommendations, & fraud detection. No more dual-cloud complexity. No more proportional cost scaling. Just one open, unified data platform, queryable across Athena, Spark, & dbt. The result: 60% reduction in data processing costs, 75% lower latency for analytics results. #ApacheIceberg #DataArchitecture #CloudMigration
-
-
🚀 Simplify semi-structured data processing with new array functions in Amazon Redshift. 👉 https://go.aws/3NmZA8u Working with semi-structured data just got easier. Amazon Redshift now supports 9 new array functions for SUPER data types. New capabilities include: 🔎 ARRAY_CONTAINS & ARRAY_POSITION for element lookup 🔗 ARRAY_INTERSECTION & ARRAY_EXCEPT for set operations ↕️ ARRAY_SORT & ARRAY_DISTINCT for organizing & deduplicating data These functions enable you to search, compare, sort & transform arrays directly within SQL statements—perfect for nested data structures, event processing & analytics workflows at scale. #AmazonRedshift #SQLAnalytics #CloudDataWarehouse
-
💡 Billions of records. Sub-hour aggregations. Zero Spark clusters. Read this customer blog to learn how Verisk did it. 👉 https://go.aws/40kM48y Verisk, a catastrophe modeling SaaS provider serving insurance & reinsurance companies worldwide, rebuilt their catastrophe modeling pipeline on AWS with Apache Iceberg at the core. The outcome? Dramatically faster processing, lower storage costs, & data that's queryable the instant it lands. No waiting on batch crawlers. No stale schemas. No duplicate datasets for different query engines. Just one copy of data, accessible across #AmazonRedshift, #AmazonAthena, & Spark simultaneously. If you're still managing separate data copies per engine, running nightly crawler jobs, or wrestling with multi-tenant isolation at scale, this is the customer blog to read. Verisk’s journey has positioned them to scale confidently into the future, processing not just billions, but potentially trillions of records as their model resolution increases. #DataArchitecture
-
🚀 Amazon Redshift now supports reusable templates for COPY operations — define your parameters once, reference them everywhere. 👉 https://go.aws/4uig3LN Here's what that means for your team: ✅ Consistent data ingestion across every pipeline ⚡ Faster execution with less manual input 🔧 Update a template once, apply everywhere 🔀 Flexibility to override parameters when needed Available in all AWS Regions, including AWS GovCloud (US). Read the blog to get started. #AmazonRedshift #AWS #CloudDataWarehouse
-
🚀 Amazon OpenSearch Serverless just launched collection groups to help optimize costs for multi-tenant workloads. Learn more. 👉 https://go.aws/4aTV2zD For builders, it can be challenging to optimize costs managing multi-tenant workloads with individual KMS keys for data isolation. Previously, on Amazon OpenSearch Serverless, each unique KMS key required dedicated OpenSearch Compute Units (OCUs). For teams managing many small tenants, this meant fragmented resources & higher infrastructure costs. Collection groups change that. Now you can group collections together to share compute while still enforcing strong security boundaries. This allows you to design multi-tenant architectures that are both secure & cost-efficient. With collection groups you can: 🔐 Share OCUs across collections with different KMS keys while maintaining tenant isolation 💰 Reduce cost for large numbers of smaller workloads 🔮 Control performance & spend by defining minimum & maximum OCUs If you're building multi-tenant search or AI applications, collection groups give you a more flexible way to balance security & cost. #AWSAnalytics #DataAnalytics #VectorSearch