Lei Mao's Log Book
Lei Mao's Log BookCurriculumBlogArticlesProjectsPublicationsReadingsLifeEssayPhotographyArchivesCategoriesTagsFAQs
  • Tags
  • Computer Architecture

Roofline Performance Model

 03-26-2025 03-26-2025 blog 7 minutes read (About 1078 words)
Understand the Performance Limitations and Gaps

 
Accelerated Computing, 
High Performance Computing, 
Computer Architecture, 
Performance  
  Read More

Grouped Query Attention Performance Theoretical Analysis

 02-03-2025 03-02-2025 blog 11 minutes read (About 1612 words)
Sharing Key and Value Tensors for a Group of Query Tensors to Mitigate Transformer Attention Layer Performance Bottleneck

 
Deep Learning, 
Transformer, 
Computer Architecture, 
Neural Network, 
Performance Optimization, 
Large Language Model  
  Read More

Transformer Vanilla Attention Performance Theoretical Analysis

 01-27-2025 03-02-2025 blog 9 minutes read (About 1275 words)
Performance Bottleneck for Serving Transformer Models

 
Deep Learning, 
Transformer, 
Computer Architecture, 
Neural Network, 
Performance Optimization, 
Large Language Model  
  Read More

Function Approximation Using Lookup Table and Interpolation

 09-22-2023 09-22-2023 blog 7 minutes read (About 1001 words)
Using Motorola CPU32 as an Example

 
Deep Learning, 
Computer Architecture, 
Quantization  
  Read More

Row-Major VS Column-Major

 05-12-2023 05-12-2023 blog 28 minutes read (About 4154 words)
Ways of Packing Matrix in Memory and Its Consequence for Matrix Multiplication

 
CPP, 
CUDA, 
Computer Architecture, 
Memory  
  Read More

Multi-Thread Single-Stream VS Single-Thread Multi-Stream CUDA

 10-18-2021 05-12-2022 blog 13 minutes read (About 1946 words)
CUDA Programming Choices for CUDA Stream

 
Deep Learning, 
Mathematics, 
CUDA, 
Parallel Computing, 
High Performance Computing, 
Computer Architecture  
  Read More

Math-Bound VS Memory-Bound Operations

 10-11-2021 09-18-2023 blog 8 minutes read (About 1188 words)
Computation Bandwidth, Memory Bandwidth, and Data Reuse

 
Deep Learning, 
Mathematics, 
Computer Architecture  
  Read More

Binary VS Text Mode for File I/O Operations

 12-22-2019 09-16-2022 blog 9 minutes read (About 1395 words)
Some Fundamental Concepts for Reading and Writing Files

 
Software Engineering, 
Computer Architecture  
  Read More
Lei Mao

Lei Mao

Artificial Intelligence Machine Learning Computer Science

Menlo Park, California

Posts

1287

Categories

8

Tags

787

  Follow   Sponsor

Advertisement


Categories

  • article20
  • blog557
  • essay325
  • life294
  • miscellaneous2
  • photography61
  • project20
  • reading8

follow.it

Recents

02-01-2026

2025 年跑步总结

essay

01-31-2026

2026 Rotary Mission Ten Half Marathon 竞赛

life

01-27-2026

狗的素质等于人的素质

essay

01-26-2026

CUDA Rendezvous Stream

blog

01-24-2026

Pleasanton Ridge Regional Park 徒步

life

Archives

  • February 20261
  • January 202616
  • December 202531
  • November 202525
  • October 202524
  • See All >>

Tags

Outdoors299
California230
Hiking230
CPP119
Mathematics102
Deep Learning84
Photography75
CUDA70
Running61
Wildlife52
Bird46
Racing39
Python36
Software Engineering36
Machine Learning34
Movie33
Statistics32
Park31
Linux30
NVIDIA30
See All >>
Lei Mao's Log Book

© 2017-2026 Lei Mao  Powered by Hexo & Icarus
Site UV:  Site PV:

×