ChengXiang Qi(齐呈祥)

kuangjux [at] outlook [dot] com

ChengXiang Qi

Hi, my name is Chengxiang Qi. I am currently a second-year master student in computer science at the University of Chinese Academy of Sciences. I completed my undergraduate studies at Tianjin University. My current interests are in deep learning compilers, machine learning systems, and deep learning. In the past, I have also been interested in the Rust programming language and its applications at the system level (OS, Hypervisor, etc.). I am currently interning at Wechat LLM Infra Team.

In the past, I was a research intern in the MSRA System Group, where I focused on two projects:

  • microsoft/TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
  • microsoft/FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of lists of statically-shaped tensors, referred to as a FractalTensor.
During my time at MSRA, I was very fortunate to work with my mentor Ying Cao, who is a very idealistic and capable researcher! We built the above two projects together and published a paper at SOSP’24.

Before I joined the MSRA system group, I interned at the Operating System Laboratory of Tsinghua University, where I focused on Type-2 hypervisors and network performance on arceos:

  • arceos-org/arceos is an experimental modular OS written in Rust. I added hypervisor feature, support for ixgbg 82599 NIC driver and optimized network performance.
  • hypercraft is a VMM library written in Rust. It was developed from the hypocaust-2 project, with the main purpose of providing virtualization support for arceos and can be used as a standalone component.

    CV  /  GitHub  /  Zhihu

    Education

    • University of Chinese Academy of Sciences

      Master of Engineering in Computer Technology

      Sep. 2023 - Present

    • Tianjin University

      Bachelor of Engineering in Computer Science and Technology

      Sep. 2019 - June. 2023

    Experiences

    Research Intern in System and Network Group at Microsoft Research Asia

    Feb 2024 - May 2025

    Mentor: Dr. Ying Cao

    • Based on the FractalTensor programming model, I used CUTLASS to optimize the implementation of algorithms such as GEMM, Back‑to‑Back GEMMs, Stacked/Dilated LSTM, and FlashAttention‑2. Performance evaluation was performed on NVIDIA A100, and compared with SOTA, the performance acceleration can reach up to 5.45 times, and the average performance acceleration can reach 2.14 times.
    • As a core designer and developer, I designed and implemented TileFusion, an efficient C++ macro kernel template library to improve the abstraction level of tile processing in CUDA C. It implements various optimization techniques based on NVIDIA hardware and implements multiple hardware-aware algorithms. Currently, TileFusion has similar performance to CUTLASS on multiple DNN workloads.

    Research Intern in Operating System Lab at Tsinghua University

    May 2023 - July 2023

    Mentors: Prof. Yu Chen, Dr. Yuekai Jia

    • Performance Improvement. Performed network performance benchmarking with tools like Apache Http Server and iperf, and developed benchmarking tools to evaluate the network card's raw socket send and receive capabilities. Made modifications to the Network Protocol Stack and its interface with Arceos to enhance network bandwidth.
    • NIC driver. Wrote a driver for the Intel 82599 network interface card with Rust programming language. Referred to DPDK for performance optimization and integrated it as a crate into Arceos. Run successfully real-world applications such as httpserver, iperf, and Redis on an AMD machines.
    • Type-2 hypervisor based arceos. Developed a type-2 hypervisor based on Arceos which be capable of booting Linux.

    Active Projects

    microsoft/TileFusion
    TileFusion is an efficient C++ macro kernel library designed to elevate the level of abstraction in CUDA C for tile processing. The library
    • Higher-Level Programming Constructs: TileFusion supports tiles across the three-level GPU memory hierarchy, providing device kernels for transferring tiles between CUDA memory hierarchies and for tile computation.
    • Modularity: TileFusion enables applications to process larger tiles built out of BaseTiles in both time and space, abstracting away low-level hardware details.
    • Efficiency: The library's BaseTiles are designed to match TensorCore instruction shapes and encapsulate hardware-specific performance parameters, ensuring optimal utilization of TensorCore capabilities.
    Star
    microsoft/FractalTensor

    FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of lists of statically-shaped tensors, referred to as a FractalTensor. It supports advanced functional list operations, including array compute operators inherited from second-order array combinators (SOACs) such as map, reduce, and scan, as well as first-order array access operators. These high-level operators can be applied to process the nested structure of FractalTensor variables, explicitly revealing opportunities for exploiting nested data parallelism and access locality through automatic compiler analysis.

    Star

    Projects

    rCore-OS/arceos: An experimental modular OS written in Rust.
    Arceos is a unikernel developed by the rCore-OS community at Tsinghua University.
    • I integrated hypercraft into arceos, enabling it to be launched as a type-2 hypervisor.
    • I added interrupt support to arceos and implemented IO interrupts based on virtio-net and virtio-blk.
    • I implemented ixgbe NIC driver for arceos and did performance optimizations in driver layer and network stack layer.
    Star
    xv6-rust: Reimplementation of xv6-riscv in the Rust language.

    A Unix-like operating system implemented by pure rust. This project is a reimplementation of xv6-riscv. Besides, it do some optimizations to the origin system such as memory allocation and file system. This project also serves as the reference implementation for OSCOMP Project 4 You can find more information abot this project in documents.

    Star
    hypercraft: a VMM crate written in Rust.

    Hypercraft is a VMM(Virtual Machine Monitor) crate written in Rust. Currently, hypercraft is integrated as a crate into rcore-os/arceos and can be launched as a type-2 hypervisor and be capable of booting Linux.

    Star
    hypocaust-2: A hardware-assisted virtualization RISC-V hypervisor using H extensions.

    Hypocaust-2 implements SBI call processing, two-stage page table translation, interrupt emulation and forwarding and exception forwarding, passthrough or emulation of some peripherals. Currently hypocaust-2 can boot & run rCore-Tutorial-v3, rt-thread and mainline Linux, expecting to expande to multi-core & multi-guest.

    Star
    hypocaust: a S-mode trap and emulate type-1 hypervisor run on RISC-V machine.

    Hypocaust implements privilege-level instruction emulation (CSR-related and SFECE VMA), construction of shadow page tables, synchronization of guest page tables and shadow page tables, forwarding of interrupts and exceptions, emulation of clocks and virtio block devices, and can currently run minikernel.

    Star

    Publications

    Selected Awards

    • 2022 NSCSCC Thrid Prize in Team Competition
    • 2021 The Best Quality Prize of OSPP(only 5 in China a year)
    • 2021 OSCOMP Thrid Prize in Team Competition

    Talks

    Hypocasut: a RISC-V type-1 hypervisor written in Rust.

    • OS2ATC 2022, BeiJing, March 2023