DS Tesla M Class Aug11
DS Tesla M Class Aug11
Based on the cUDA architecture codenamed Fermi, the Tesla M-class GPU computing Modules are the worlds fastest parallel computing processors for high performance computing (HPc). Tesla GPUs high performance makes them ideal for seismic processing, biochemistry simulations, weather and climate modeling, signal processing, computational finance, cAE, cFDand data analytics.
Accelerate your science with NVIDIA Tesla 20-series GPUs. A companion processor to the cPU in the server, Tesla GPUs speed up HPc applications by 10x. Based on the Fermi architecture, these GPUs feature up to 665 gigaflops of double precision performance, 1 teraflop of single precision performance, Ecc memory error protection, and L1 and L2 caches. The Tesla M-class GPU modules are integrated into GPU-cPU servers from oEMs. This gives data center IT staff much greater choice in how they deploy GPUs, with a wide variety of rackmount and blade systems and with remote monitoring and management capabilities enabling large data center, scale-out deployments.
TEcHNIcAL SPEcIFIcATIoNS
Tesla M2090 Peak double precision oating point performance Peak single precision oating point performance CUDA cores Memory size (GDDR5) Memory bandwidth (ECC off) 665 Gigaops 1331 Gigaops 1030 Gigaops 512 6 GigaBytes 177 GBytes/sec 448 6 GigaBytes 150 GBytes/sec
* Note: With Ecc on, 12.5% of the GPU memory is used for Ecc bits. So, for example, 3 GB total memory yields 2.625 GB of user available memory with Ecc on. MoLEcULAr DyNAMIcS
Tesla M2070 / M2075 515 Gigaops Tesla M2050 515 Gigaops 1030 Gigaops 448 3 GigaBytes 148 GBytes/sec
NVIDIA TESLA | DATASHEET | AUG 11
Benefits
Delivers up to 665 Gigaflops of double-precision peak performance in each GPU, enabling servers from leading oEMs to deliver more than a teraflop of double-precision performance per 1 rU of space. Single precision peak performance is over one Teraflop per GPU. Meets a critical requirement for computing accuracy and reliability in datacenters and supercomputing centers. Internal register files, L1/L2 caches, shared memory, and external DrAM all are Ecc protected. Maximizes performance and reduces data transfers by keeping larger data sets in local memory that is attached directly to the GPU. Integrates the GPU subsystem with the host systems monitoring and management capabilities such as IPMI or oEM-proprietary tools. IT staff can thus manage the GPU processors in the computing system using widely used cluster/grid management solutions. Accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand. Maximizes the throughput by faster context switching that is 10X faster than previous architecture, concurrent kernel execution, and improved thread block scheduling. Turbocharges system performance by transferring data over the PcIe bus while the computing cores are crunching other data. choose c, c++, opencL, Directcompute, or Fortran to express application parallelism and take advantage of the innovative Fermi architecture.
FLEXIBLE ProGrAMMING ENVIroNMENT WITH BroAD SUPPorT oF ProGrAMMING LANGUAGES AND APIS