Yanshu Wang

About Me

I am a Postdoctoral Researcher at the School of Computer Science, Peking University. I received my Ph.D. in Computer Science and Technology from Tsinghua University in 2022 (GPA: 3.93/4.0), and double B.S./B.A. degrees in Computer Science and Economics from Xi'an Jiaotong University in 2017, where I ranked 1st in both majors.

My research interests include AI agent design, LLM inference optimization, distributed machine learning, and high-performance computing systems. Currently, I focus on developing AI agents for automated data acquisition and analysis, as well as intelligent workflow optimization.

News

[2025] Paper accepted at ICDE 2025 (CCF A) on sketch-based network measurement.
[2022] Joined Peking University as a Postdoctoral Researcher.
[2022] Paper accepted at NSDI 2022 (CCF A) on hybrid flow table management.
[2017] Received Ph.D. from Tsinghua University.

Education

Ph.D. in Computer Science and Technology

Tsinghua University, 2017 – 2022

GPA: 3.93/4.0

B.S. in Computer Science & B.A. in Economics

Xi'an Jiaotong University, 2013 – 2017

Ranked 1st in both majors

Research Interests

AI Agent Design

Building intelligent agents for automated data acquisition and analysis, enabling trustworthy and traceable workflow optimization across vertical domains.

LLM Optimization

Model quantization using Hessian-based methods and high-dimensional subspace compression. Post-training optimization including SFT and RLHF for domain-specific applications.

Distributed ML Systems

Designing efficient parameter synchronization topologies and gradient compression algorithms for large-scale distributed machine learning training.

Network Systems

Sketch-based network measurement, hybrid hardware/software flow table management, and high-performance network function virtualization.

Selected Publications

DaVinci Sketch: A Versatile Sketch for Efficient and Comprehensive Set Measurements

Yanshu Wang, Jianan Ji, Chao-Hsuan Liu, Hengyang Zhou, Tong Yang

ICDE 2025 CCF A — IEEE International Conference on Data Engineering
Elixir: A High-performance and Low-cost Approach to Managing Hardware/Software Hybrid Flow Tables

Yanshu Wang, Dan Li, Yuanwei Lu, Jianping Wu, Hua Shao, Yutian Wang

NSDI 2022 CCF A — USENIX Symposium on Networked Systems Design and Implementation
FastKeeper: A Fast Algorithm for Identifying Top-k Real-time Large Flows

Yanshu Wang, Dan Li, Jianping Wu

Globecom 2021 CCF C — IEEE Global Communications Conference
BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training

Songtao Wang et al., including Yanshu Wang

NeurIPS 2018 CCF A — Conference on Neural Information Processing Systems

View full publication list on Google Scholar

Selected Projects

AI Agent for Data Acquisition and Analysis

2024 – Present · Peking University

Building intelligent agents for automated data acquisition and analysis processing, enabling efficient workflow optimization across multiple vertical domains. Key features include trustworthy and traceable data pipelines.

AI Agent Data Analysis Workflow Automation

Medical Large Language Model Post-Training

2023 – 2024

Post-training optimization for medical domain large models using Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), improving model accuracy and safety in clinical applications.

LLM Medical AI RLHF SFT

Large Model Quantization

2023

Designed weight quantization using Hessian-based methods and high-dimensional subspace compression for efficient large model inference. Achieved 30% model compression with minimal performance degradation.

Model Compression Quantization Inference Optimization

DaVinci: Unified Sketch for Set Measurements

2022 – 2024 · Published at ICDE 2025

Developed a unified sketch data structure that reduces hash collision and supports 9 types of set measurement tasks. Achieved 90% memory savings and 40x throughput improvement over existing approaches.

Sketch Algorithms Network Measurement Data Structures

Elixir: Hybrid Flow Table Management

2019 – 2022 · Published at NSDI 2022

Proposed a hybrid hardware/software scheme for NFV gateway flow table management. Achieved 50% resource reduction and 97% latency reduction compared to software-only approaches.

NFV Flow Tables Network Systems

BML: Distributed ML Gradient Synchronization

2017 – 2018 · Published at NeurIPS 2018

Designed a multidimensional ring topology for parameter synchronization in distributed machine learning, achieving 25–50% synchronization time reduction and 56.4% overall training acceleration.

Distributed Systems Machine Learning Parameter Sync