uv-xiao
Youwei Xiao (肖有为)
School of Integrated Circuits
Peking University
Beijing, China
I am a Ph.D. candidate at the School of IC, Peking University, advised by Prof. Yun Liang. My research focuses on software techniques for compiler-architecture-hardware full-stack co-optimization, with emphasis on domain-specific languages (DSLs), compiler techniques, and hardware synthesis frameworks. Before that, I received my Bachelor of Science in EECS at Peking University in 2022.
My research work centers on developing EDA software techniques that bridge the gap between high-level architectural specifications and low-level hardware implementations. I have contributed to several projects focusing on multi-level intermediate representations, hardware synthesis frameworks, and domain-specific processor customization. Notable contributions include the development of the open-source hardware description language Cement (FPGA 2024) and the high-level synthesis framework Hector (ICCAD 2022), as well as work on automatic accelerator generation with control flow and data access optimization (Cayman, DAC 2025). Currently, I am working on frameworks for automatic processor instruction customization using equivalence anti-unification techniques and full-stack solutions for domain-specific processor design (ISAMORE, to appear at ASPLOS 2026).
I’m thinking about exploring composite compilation techniques that integrate formal methods with large language models for architecture-hardware co-optimization (EggMind, to come soon). I’m also interested in software techniques for architecture exploration with comprehensive modeling of data movement patterns and microarchitecture design spaces in mind, and get both hardware and evaluation in ONE CLICK. My ultimate dream is to establish comprehensive frameworks that enable unified abstraction-based architecture performance analysis, automatic optimization, and seamless hardware implementation generation. To achieve this goal, I initiated the APS project together with my lab classmates.
Based on my accumulated skills in compilation technology, DSL, and architecture, I am actively exploring a new generation of AI compiler frameworks. The system software stack for deploying and training LLM spans multiple levels including infrastructure, graph compilation, and operator compilation, and requires retargetable capabilities for different hardware architectures. I believe the potential role of compilation in the AI system stack has yet to be fully realized. Currently, I am deeply researching frameworks and optimizations for MegaKernels that can fully leverage compilation (both graph-level and kernel-level) potential in multi-node scenarios. I hope to share my results as soon as possible.
selected publications
- ASPLOSFinding Reusable Instructions via E-Graph Anti-Unification2026 (to appear)
- PreprintAquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR2025
- PreprintSkyEgg: Joint Implementation Selection and Scheduling for Hardware Synthesis using E-graphs2025
- Preprint
- ICCADInvited Paper: APS: Open-Source Hardware-Software Co-Design Framework for Agile Processor SpecializationIn Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’25), 2025
- ICCADClay: High-level ASIP Framework for Flexible Microarchitecture-Aware Instruction CustomizationIn Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’25), 2025
- DACCayman: Custom Accelerator Generation with Control Flow and Data Access OptimizationIn Proceedings of the 62nd ACM/IEEE Design Automation Conference (DAC ’25), 2025
- LATTEcmt2: Rule-Based Hardware Description in Rust with Temporal SemanticsIn 5th Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE’25), 2025
- FPGAAn Empirical Comparison of LLM-based Hardware Design and High-level SynthesisIn Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA ’25), 2025
- FPGACement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisIn Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA ’24), 2024
- ICCADHECTOR: A Multi-Level Intermediate Representation for Hardware Synthesis MethodologiesIn Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’22), 2022