Gavin M. Roy: myYearBook - Com Architecture (Highload++, Moscow, Russia, October 2008)
Gavin M. Roy: myYearBook - Com Architecture (Highload++, Moscow, Russia, October 2008)
com Architecture
Lessons Learned from the Trials of
Scaling a High Traffic Website
• Founded in 2005
• 3rd Largest Social
Network in United
States
• Teenage Demographic
• 60+ Employees
January 2007
• 100M Pageviews
• 1 Database Server
• 1 Web ApplicaOon Server
• Daily issues with load and site availability
September 2008
• 2.5B Pageviews
• 30 Database Servers
• 120 Web ApplicaOon Servers
• 99.94% UpOme as measured by pingdom.com
Key Architecture Components
• PHP5, APC • LighYpd
• Apache hYpd • Isilon IQ Clustered NAS
• PostgreSQL • Message Systems
• Memcached eCelerity
• Apache AcOveMQ • Subversion
Web ApplicaOon Architecture
• 2005‐2007: Monolithic Code Base
• 2008: MigraOng to a Services Oriented
Architecture
– ApplicaOons get own resources
– Loosely Coupled architecture
• MVC ApplicaOon using XSLT
Web ApplicaOon Architecture
• Why SOA?
– Monolithic app wastes
hardware
– Cross Data‐Center
OperaOons
– SelecOve Maintenance
Scaling Postgres
Rules for Scaling
1. Plan for Growth
2. Know the internals
3. Bigger Hardware is
BeYer
Our Postgres Scaling History
• Quarter 1, 2007
– Monolithic database with one schema, many
complex joins and poor opOmizaOon
– No plan for growth
– No DBA
Our Postgres Scaling History
• Quarter 3, 2008
– Horizontal “Sharded” Data
– VerOcal ParOOoning
– 5000 ConnecOons/sec Avg
Scaling Postgres: Lessons Learned
• Scaling web servers means many database
connecOons, needed pooling
– Started with pgPool moved to pgBouncer
• Started with Slony replicaOng read‐only slaves
– High IO/CPU Overhead
Scaling Postgres: Lessons Learned
• Began scaling verOcally by separaOng
applicaOon data by database servers and
removed read only slaves
• Needed few small tables replicated that could
be slightly inaccurate and eventually
consistent (BASE)
Scaling Postgres: Lessons Learned
• Enter plProxy
– Database parOOoning language by Skype uOlizing
PostgreSQL funcOons
– Trigger based plProxy funcOons replicate needed
tables without the Queue overhead
– NOT TRANSACTION SAFE
Scaling Postgres: Lessons Learned
• Standard Use of plProxy
– Horizontal parOOoning of data by ID across
mulOple servers
– Example: Messaging System
• 8 Servers store actual parOOoned message data
• Rule #1 – Plan for Growth
Scaling Postgres: Lessons Learned
• Knowing internals
– pg_catalog
• pg_stat_user_tables
• pg_stat_user_indexes
Scaling Postgres: Knowing Internals
Scaling Postgres: Lessons Learned
• Database Ecosystem
– Performance Factors
• Index bloat
• Usage changes
– Abuse
• Cache uOlizaOon
contenOon
Scaling Postgres: Lessons Learned
• Bigger is BeYer
– More RAM
– More Disks
– Faster and More CPU
Scaling Postgres: Lessons Learned
Scaling Across CPU Cores Before and A=er Upgade
• PostgreSQL Scales to 32
Cores
• Extensive Benchmarking @
MYB
Scaling Postgres: Future Plans
• More ParOOoning
• SOA Data DistribuOon
– Golconde
• Python Based
• Apache AcOveMQ
Apache AcOveMQ
• Java based Message
Broker soqware
• Client language neutral
• Implements JMS 1.1,
Stomp, XMPP, REST and
Others
AcOveMQ @ myYearbook.com
Out‐of‐band Processing Targeted Workload
• Uploaded content processing • Message Queues allow for the
– Image Resize
– Content analysis (R&D)
right server for the job
– AnO‐Virus Scans • BeYer distribuOon of CPU
• Comment and Message processing intensive tasks without
– Spam Processing negaOvely impacOng the user
• Email spooling from web experience
applicaOon
• Anywhere we can that makes sense • Clusterable, Scalable
Memcached: Key for Success
• Valuable Scaling Tool
– Over 250k get requests second during peak
– Over 750GB of cached data
– Easy to Deploy
– The more distributed the cache becomes the less
impacOng cache failures become ‐ more boxes are
beYer than fewer
Memcached: PotenOal Problems
• Large scale implementaOons can have some hidden
problems
– Lots of network traffic
– Non‐parOOon or evenly distributed data
• What to do for data that is not evenly distributed?
– Implemented a round‐robin cluster of memcache servers
that contain the same data
Research and Development
• Copyr
– Copy‐on‐Write Filesystem ReplicaOon
• Framewerk
– PHP5 OO Development Framework
• Golconde
– Queue Based Data DistribuOon for PostgreSQL
• Lightr
– PHP5 XMPP Class Library
• mod_xsltd
– LighYpd XSL TransformaOon module
• Playr
– PostgreSQL Log Replay
• Staplr
– STAOsical Package Logically engineered Right
Tools for Success
• OperaOons Portal
– ExecuOve Level Overview of OperaOonal Status
and ProducOon Change Log
• Staplr
– Trending & AnalyOs System
OperaOons Portal
Trending and Analysis: Staplr
• Version 0.6
– PHP Based
– Process forking
– Shelled RRD Commands
• Version 2.0
– Python Based
– Threaded
– Python wrappers to librrd
Trending and Analysis: Staplr
• Polls for:
– Apache hYpd
– Apache AcOveMQ
– lighYpd
– memcached
– MySQL
– pgBouncer
– PostgreSQL
– SNMP Data
• APC, Isilon, F5, Xiotech, Others
– SysStat
QuesOons?