Department of Computer Science and Technology, Tsinghua University
Welcome to my homepage! I am Weihang Su (苏炜航), a second-year PhD student at the Department of Computer Science and Technology, Tsinghua University, under the supervision of Prof. Yiqun Liu.
My research focuses on leveraging AI technology to better meet user’s information needs, specifically in the following areas:
I am currently exploring an interesting direction: Parametric Retrieval-Augmented Generation (Parametric RAG). We propose a new RAG paradigm that directly injects external knowledge into the parameters of large language models (LLMs) rather than relying on traditional in-context knowledge injection that appends retrieved documents to the LLM’s input. By parameterizing documents and integrating them into the model during the inference stage, Parametric RAG improves the overall performance of the RAG system and online efficiency while maintaining flexibility. Feel free to check out the preprint version of our paper: Parametric Retrieval-Augmented Generation.
I am also passionate about mentoring undergraduate students in research. I’ve collaborated with undergraduate students like Changyue Wang, Yichen Tang, and Anzhe Xie, co-authoring high-quality papers at top-tier conferences such as ACL, EMNLP, TOIS, SIGIR-AP, etc.
If you are an undergraduate interested in my research areas and aim to publish high-quality papers, you can apply for an internship with the THUIR group through official channels or contact me directly to embark on meaningful research together!
The titles of my first-author papers are in bold (excluding co-first where the ranking is not first).
Parametric Retrieval Augmented Generation
Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, Yiqun Liu
(Long Paper) Paper Code
JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System
Weihang Su, Baoqing Yue, Qingyao Ai, Yiran Hu, Jiaqi Li, Changyue Wang, Kaiyuan Zhang, Yueyue Wu, Yiqun Liu
(Long Paper) Code and Dataset
Benchmarking Computer Science Survey Generation
Weihang Su, Anzhe Xie, Qingyao Ai, Jianming Long, Jiaxin Mao, Yiqun Liu
(Long Paper) Code and Dataset
RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
Yiteng Tu, Weihang Su, Yujia Zhou, Yiqun Liu, Qingyao Ai
(Long Paper) Paper Code
Knowledge Editing through Chain-of-Thought
Changyue Wang, Weihang Su, Qingyao Ai, Yiqun Liu
(Long Paper) Paper Code
Mitigating Entity-Level Hallucinations in Large Language Models
Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, Yiqun Liu
International ACM SIGIR Conference on Information Retrieval in the Asia Pacific
SIGIR-AP 2024 (Long Paper) Paper Code
LeKUBE: A Legal Knowledge Update BEnchmark
Changyue Wang, Weihang Su, Hu Yiran, Qingyao Ai, Yueyue Wu, Cheng Luo, Yiqun Liu, Min Zhang, Shaoping Ma
International ACM SIGIR Conference on Information Retrieval in the Asia Pacific
SIGIR-AP 2024 (Long Paper) Paper Code
STARD: A Chinese Statute Retrieval Dataset with Real Queries Issued by Non-professionals
Weihang Su, Yiran Hu, Anzhe Xie, Qingyao Ai, Zibing Que, Yun Liu, Weixing Shen, Yiqun LIU
The 2024 Conference on Empirical Methods in Natural Language Processing
EMNLP 2024 Findings (Long Paper, CCF-B, THU-A) Paper Code
DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models
Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, Yiqun Liu.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics.
**ACL 2024 Main Oral (Long Paper, CCF-A, THU-A) **
[Paper] Code
Unsupervised real-time hallucination detection based on the internal states of large language models
Weihang Su, Changyue Wang, Qingyao Ai, Yiran Hu, Zhijing Wu, Yujia Zhou, Yiqun Liu.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics.
ACL 2024 Findings (Long Paper, CCF-A, THU-A)
Paper Code
Scaling Laws For Dense Retrieval.
Yan Fang, Jingtao Zhan, Qingyao Ai, Jiaxin Mao, Weihang Su, Jia Chen and Yiqun Liu.
The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
SIGIR 2024 Best Paper Award (Long Paper, CCF-A, THU-A)
Wikiformer: Pre-training with Structured Information of Wikipedia for Ad-hoc Retrieval.
Weihang Su, Qingyao Ai, Xiangsheng Li, Jia Chen, Yiqun Liu, Xiaolong Wu and Shengluan Hou.
The 38th Annual AAAI Conference on Artificial Intelligence
AAAI 2024 (Long Paper, CCF-A, THU-A)
Paper Code
Relevance Feedback with Brain Signals.
Ziyi Ye, Xiaohui Xie, Qingyao Ai, Yiqun Liu, Zhihong Wang, Weihang Su and Min Zhang.
ACM Transactions on Information Systems
TOIS 2024 (Long Paper, CCF-A, THU-A)
THUIR2 at NTCIR-16 Session Search (SS) Task
Weihang Su, Xiangsheng Li, Yiqun Liu, Min Zhang, Shaoping Ma
NII Testbeds and Community for Information access Research Project
NTCIR 2022
Web Search via an Efficient and Effective Brain-Machine Interface.
Xuesong Chen, Ziyi Ye, Xiaohui Xie, Yiqun Liu, Xiaorong Gao, Weihang Su, Shuqi Zhu, Yike Sun, Min Zhang, and Shaoping Ma.
The 15th ACM International Conference on Web Search and Data Mining.
(WSDM 2022) (Demo Paper, CCF-B, THU-A)
Trade or trick? detecting and characterizing scam tokens on uniswap decentralized exchange
Pengcheng Xia, Haoyu Wang, Bingyu Gao, Weihang Su, Zhou Yu, Xiapu Luo, Chao Zhang, Xusheng Xiao, Guoai Xu
International Conference on Measurement and Modeling of Computer Systems
(SIGMETRICS 2022) (Long Paper, CCF-B, THU-A)