About me

Welcome to my homepage! I am Weihang Su (苏炜航), a second-year PhD student at the Department of Computer Science and Technology, Tsinghua University, under the supervision of Prof. Yiqun Liu.

My research focuses on leveraging AI technology to better meet user’s information needs, specifically in the following areas:

Retrieval Augmented Generation (RAG)
Complex IR Tasks
AI for Legal Applications

I am also passionate about mentoring undergraduate students in research. I’ve collaborated with undergraduate students like Changyue Wang, Yichen Tang, and Anzhe Xie, co-authoring high-quality papers at top-tier conferences such as ACL.

Suppose you are an undergraduate interested in my research areas and aiming to publish high-quality papers. In that case, you can apply for an internship with the THUIR group through official channels or contact me directly to embark on meaningful research together!

News

Our paper “Scaling Laws for Dense Retrieval” received the SIGIR Best Paper Award!
My first-authored paper, “DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models,” has been selected for an Oral presentation at ACL! (Top 2.6% in submissions, top 6.8% in accepted papers)
Two of my first-authored Long Papers were accepted at ACL 2024! This includes one paper in the main conference and one in the findings.
My first-authored Long Paper was accepted at AAAI 2024!
Our team participated in COLIEE 2023, and won the championship! Here is the link to our Technical Report: https://arxiv.org/abs/2304.12650
We participated in the WSDM Cup 2023 and won silver medals in two tasks! Here is the news report: https://www.cs.tsinghua.edu.cn/info/1088/5286.htm

Selected Awards

SIGIR 2024 Best Paper Award
Winner of the Language and Intelligence Challenge (LIC) Contest (4 winners in the world. Ranked 0.3% among all teams).
The first place in the first round of evaluation and the second place in the second round of evaluation of the NTCIR-16 Session Search task.
Beijing Outstanding Undergraduate

Publications

Link to Google Scholar

Paper Under Submission

Caseformer: Pre-training for Legal Case Retrieval
Weihang Su, Qingyao Ai, Yueyue Wu, Yixiao Ma, Haitao Li, Yiqun Liu
(Long Paper) Paper Code
Mitigating Entity-Level Hallucinations in Large Language Models
Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, Yiqun Liu
(Long Paper) Paper Code
STARD: A Chinese Statute Retrieval Dataset with Real Queries Issued by Non-professionals
Weihang Su, Yiran Hu, Anzhe Xie, Qingyao Ai, Zibing Que, Yun Liu, Weixing Shen, Yiqun LIU
(Long Paper) Paper Code

Year 2024

DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models
Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, Yiqun Liu.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics.
ACL 2024 Main (Long Paper, CCF-A, THU-A) [Oral]
[Paper] Code
Unsupervised real-time hallucination detection based on the internal states of large language models
Weihang Su, Changyue Wang, Qingyao Ai, Yiran Hu, Zhijing Wu, Yujia Zhou, Yiqun Liu.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics.
ACL 2024 Findings (Long Paper, CCF-A, THU-A)
Paper Code
Scaling Laws For Dense Retrieval.
Yan Fang, Jingtao Zhan, Qingyao Ai, Jiaxin Mao, Weihang Su, Jia Chen and Yiqun Liu.
The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
SIGIR 2024 Best Paper Award (Long Paper, CCF-A, THU-A)
Wikiformer: Pre-training with Structured Information of Wikipedia for Ad-hoc Retrieval.
Weihang Su, Qingyao Ai, Xiangsheng Li, Jia Chen, Yiqun Liu, Xiaolong Wu and Shengluan Hou.
The 38th Annual AAAI Conference on Artificial Intelligence
AAAI 2024 (Long Paper, CCF-A, THU-A)
Paper Code
Relevance Feedback with Brain Signals.
Ziyi Ye, Xiaohui Xie, Qingyao Ai, Yiqun Liu, Zhihong Wang, Weihang Su and Min Zhang.
ACM Transactions on Information Systems
TOIS 2024 (Long Paper, CCF-A, THU-A)

Year 2023

CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding.
Yixiao Ma, Yueyue Wu, Weihang Su, Qingyao Ai, Yiqun Liu.
The 2023 Conference on Empirical Methods in Natural Language Processing
EMNLP 2023 Main (Long Paper, CCF-B, THU-A)

Year 2022

THUIR2 at NTCIR-16 Session Search (SS) Task
Weihang Su, Xiangsheng Li, Yiqun Liu, Min Zhang, Shaoping Ma
NII Testbeds and Community for Information access Research Project
NTCIR 2022
Web Search via an Efficient and Effective Brain-Machine Interface.
Xuesong Chen, Ziyi Ye, Xiaohui Xie, Yiqun Liu, Xiaorong Gao, Weihang Su, Shuqi Zhu, Yike Sun, Min Zhang, and Shaoping Ma.
The 15th ACM International Conference on Web Search and Data Mining.
(WSDM 2022) (Demo Paper, CCF-B, THU-A)
Trade or trick? detecting and characterizing scam tokens on uniswap decentralized exchange
Pengcheng Xia, Haoyu Wang, Bingyu Gao, Weihang Su, Zhou Yu, Xiapu Luo, Chao Zhang, Xusheng Xiao, Guoai Xu
International Conference on Measurement and Modeling of Computer Systems
(SIGMETRICS 2022) (Long Paper, CCF-B, THU-A)