Skip to content
View pengw00's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report pengw00

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. vllm-david-lab vllm-david-lab Public

    Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python

  2. vllm-dynamic-sparsity vllm-dynamic-sparsity Public

    An optimized vLLM fork featuring Dynamic KV Cache Sparsity. Reduces HBM bandwidth bottlenecks by bypassing 40% of non-essential blocks via a custom Triton-based PagedAttention kernel.

    Python

  3. vllm-local-practices vllm-local-practices Public

    local test

  4. llm-kernel-triton-assignment2-systems llm-kernel-triton-assignment2-systems Public

    Forked from stanford-cs336/assignment2-systems

    Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch

    Python

  5. llm-kernel-triton-assignment3-scaling llm-kernel-triton-assignment3-scaling Public

    Forked from stanford-cs336/assignment3-scaling

    Python