Publications | HKU NLP Group

2024

OpenAgents: An Open Platform for Language Agents in the Wild

Tianbao Xie, Fan Zhou, Zhoujun Cheng, Peng Shi, Luoxuan Weng, Yitao Liu, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, and 4 more authors

Conference on Language Modeling (COLM) 2024

arXiv Code
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration

Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, and Lingpeng Kong,

Conference on Language Modeling (COLM) 2024

arXiv Code
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration

Jiahui Gao, Renjie Pi, Tianyang Han, Han Wu, Lanqing HONG, Lingpeng Kong, Xin Jiang, and Zhenguo Li,

Conference on Language Modeling (COLM) 2024

PDF
Empowering Large Language Model Agents through Action Learning

Haiteng Zhao, Chang Ma, Guoyin Wang, Jing Su, Lingpeng Kong, Jingjing Xu, Zhi-Hong Deng, and Hongxia Yang,

Conference on Language Modeling (COLM) 2024

arXiv Code
A Reparameterized Discrete Diffusion Model for Text Generation

Lin Zheng, Jianbo Yuan, Lei Yu, and Lingpeng Kong,

Conference on Language Modeling (COLM) 2024

arXiv Code
SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving

Xueliang Zhao, Xinting Huang, Wei Bi, and Lingpeng Kong,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2024

arXiv Code
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

Xueliang Zhao, Xinting Huang, Tingchen Fu, Qintong Li, Shansan Gong, Lemao Liu, Wei Bi, and Lingpeng Kong,

In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings) 2024

arXiv
Red Teaming Visual Language Models

Mukai Li, Lei Li, Yuwei Yin, Masood Ahmed, Zhenguang Liu, and Qi Liu,

In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings) 2024

arXiv Code
Large Language Models are not Fair Evaluators

Peiyi Wang, Lei Li, Liang Chen, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, and Zhifang Sui,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2024

arXiv Code
A Challenging Benchmark for Low-Resource Learning

Yudong Wang, Chang Ma, Qingxiu Dong, Lingpeng Kong, and Jingjing Xu,

In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings) 2024

arXiv Code
LoRA Meets Dropout under a Unified Framework

Sheng Wang, Liheng Chen, Jiyue Jiang, Boyang Xue, Lingpeng Kong, and Chuan Wu,

In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings) 2024

arXiv
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers

Qintong Li, Leyang Cui, Xueliang Zhao, Lingpeng Kong, and Wei Bi,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2024

arXiv Code
PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

Sheng Wang, Boyang Xue, Jiacheng Ye, Jiyue Jiang, Liheng Chen, Lingpeng Kong, and Chuan Wu,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2024

arXiv
L-Eval: Instituting Standardized Evaluation for Long Context Language Models

Chenxin An, Shansan Gong, Ming Zhong, Mukai Li, Jun Zhang, Lingpeng Kong, and Xipeng Qiu,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2024

Outstanding Paper arXiv Code

ACL 2024 Outstanding Paper
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

Lei Li, Yuqi Wang, Runxin Xu, Peiyi Wang, Xiachong Feng, Lingpeng Kong, and Qi Liu,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2024

arXiv Code
Self-Infilling Code Generation

Lin Zheng, Jianbo Yuan, Zhi Zhang, Hongxia Yang, and Lingpeng Kong,

In Proceedings of the International Conference on Machine Learning (ICML) 2024

arXiv Code
Training-Free Long-Context Scaling of Large Language Models

Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, and Lingpeng Kong,

In Proceedings of the International Conference on Machine Learning (ICML) 2024

arXiv Code
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving

Xueliang Zhao, Wenda Li, and Lingpeng Kong,

In Proceedings of the International Conference on Machine Learning (ICML) 2024

arXiv Code
Lemur: Harmonizing Natural Language and Code for Language Agents

Yiheng Xu, Hongjin Su, Chen Xing, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, and 4 more authors

In International Conference on Learning Representations (ICLR) 2024

arXiv Code
UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model in Data Science

Yazheng Yang, Yuqi Wang, Guangyi Liu, Ledell Yu Wu, and Qi Liu,

In International Conference on Learning Representations (ICLR) 2024

arXiv

2023

Can Language Models Understand Physical Concepts?

Lei Li, Jingjing Xu, Qingxiu Dong, Ce Zheng, Qi Liu, Lingpeng Kong, and Xu Sun,

In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023

arXiv Code
DetGPT: Detect What You Need via Reasoning

Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, and Lingpeng Kong Tong Zhang,

In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023

arXiv Code
Generating Data for Symbolic Language with Large Language Models

Jiacheng Ye, Chengzu Li, Lingpeng Kong, and Tao Yu,

In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023

arXiv Code
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, and Lingpeng Kong,

In Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP Findings) 2023

arXiv Code
GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning

Haiteng Zhao, Shengchao Liu, Chang Ma, Hannan Xu, Jie Fu, Zhihong Deng, Lingpeng Kong, and Qi Liu,

In Advances in Neural Information Processing Systems (NeurIPS) 2023

arXiv Code
Statistical Knowledge Assessment for Large Language Models

Qingxiu Dong, Jingjing Xu, Lingpeng Kong, Zhifang Sui, and Lei Li,

In Advances in Neural Information Processing Systems (NeurIPS) 2023

arXiv Code
Evaluating Self-Supervised Learning for Molecular Graph Embeddings

Hanchen Wang, Jean Kaddour, Shengchao Liu, Jian Tang, Matt J. Kusner, Joan Lasenby, and Qi Liu,

In Advances in Neural Information Processing Systems (NeurIPS) 2023
Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering

Zhiyong Wu, Yaoxiang Wang, Jiacheng Ye, and Lingpeng Kong,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2023

arXiv Code
Explanation Regeneration via Information Bottleneck

Qintong Li, Zhiyong Wu, Lingpeng Kong, and Wei Bi,

In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings) 2023

arXiv Code
A Cognitive Stimulation Dialogue System with Multi-source Knowledge Fusion for Elders with Cognitive Impairment

Jiyue Jiang, Sheng Wang, Qintong Li, Lingpeng Kong, and Chuan Wu,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2023

arXiv
SORTIE: Dependency-Aware Symbolic Reasoning for Logical Data-to-text Generation

Xueliang Zhao, Tingchen Fu, Lemao Liu, Lingpeng Kong, Shuming Shi, and Rui Yan,

In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings) 2023

PDF
One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu,

In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings) 2023

arXiv Code
Compositional Exemplars for In-context Learning

Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, and Lingpeng Kong,

In Proceedings of the International Conference on Machine Learning (ICML) 2023

arXiv Code
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling

Jinchao Zhang, Shuyang Jiang, Jiangtao Feng, Lin Zheng, and Lingpeng Kong,

In Proceedings of the International Conference on Machine Learning (ICML) 2023

arXiv Code
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Yih, Daniel Fried, Si-yi Wang, and Tao Yu,

In Proceedings of the International Conference on Machine Learning (ICML) 2023

arXiv Code
Efficient Attention via Control Variates

Lin Zheng, Jianbo Yuan, Chong Wang, and Lingpeng Kong,

In International Conference on Learning Representations (ICLR) 2023

arXiv Code
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning

Jiahui Gao, Renjie Pi, Lin Yong, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, and Lingpeng Kong,

International Conference on Learning Representations (ICLR) 2023

arXiv Code
Toeplitz Neural Network for Sequence Modeling

Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, and Yiran Zhong,

International Conference on Learning Representations (ICLR) 2023

arXiv Code
DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, and Lingpeng Kong,

International Conference on Learning Representations (ICLR) 2023

arXiv Code
Binding Language Models in Symbolic Languages

Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, and Tao Yu,

International Conference on Learning Representations (ICLR) 2023

arXiv Code Poster
Selective Annotation Makes Language Models Better Few-Shot Learners

Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, and Tao Yu,

International Conference on Learning Representations (ICLR) 2023

arXiv Code
Unsupervised Explanation Generation via Correct Instantiations

Sijie Chen, Zhiyong Wu, Jiangjie Chen, Zhixing Li, Yang Liu, and Lingpeng Kong,

In Proceedings of AAAI Conference on Artificial Intelligence (AAAI) 2023

arXiv Code
An Empirical Study of Retrieval-Enhanced Graph Neural Networks

Dingmin Wang, Shengchao Liu, Hanchen Wang, Bernardo Cuenca Grau, Linfeng Song, Jian Tang, Song Le, and Qi Li,

European Conference on Artificial Intelligence (ECAI) 2023
Retrieved Sequence Augmentation for Protein Representation Learning

Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, and Lingpeng Kong,

arXiv preprint 2023

arXiv Code

2022

ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback

Jiacheng Ye, Jiahui Gao, Jiangtao Feng, Zhiyong Wu, Tao Yu, and Lingpeng Kong,

In Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP Findings) 2022

arXiv Code
Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

Qi Liu, Zihuiwen Ye, Tao Yu, Phil Blunsom, and Linfeng Song,

In Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP Findings) 2022

arXiv Code
ZeroGen: Efficient Zero-shot Learning via Dataset Generation

Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, and Lingpeng Kong,

In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022

arXiv Code
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida Wang, Victor Zhong, Bailin Wang, and 11 more authors

In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022

arXiv Code Poster
CoNT: Contrastive Neural Text Generation

Chenxin An, Jiangtao Feng, Kai Lv, Lingpeng Kong, Xipeng Qiu, and Xuanjing Huang,

In Advances in Neural Information Processing Systems (NeurIPS) 2022

arXiv Code
Linear Complexity Randomized Self-attention Mechanism

Lin Zheng, Chong Wang, and Lingpeng Kong,

In Proceedings of the International Conference on Machine Learning (ICML) 2022

arXiv Code
Ripple Attention for Visual Perception with Sub-quadratic Complexity

Lin Zheng, Huijie Pan, and Lingpeng Kong,

In Proceedings of the International Conference on Machine Learning (ICML) 2022

arXiv
Event Transition Planning for Open-ended Text Generation

Qintong Li, Piji Li, Wei Bi, Zhaochun Ren, Yuxuan Lai, and Lingpeng Kong,

In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings) 2022

arXiv Code
Lexical Knowledge Internalization for Neural Dialog Generation

Zhiyong Wu, Wei Bi, Xiang Li, Lingpeng Kong, and Ben Kao,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2022

arXiv Code
Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

Jakob Prange, Nathan Schneider, and Lingpeng Kong,

In North American Chapter of the Association for Computational Linguistics (NAACL) 2022

arXiv

2021

Cascaded Head-colliding Attention

Lin Zheng, Zhiyong Wu, and Lingpeng Kong,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2021

arXiv Code
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

Zhiyong Wu, Lingpeng Kong, Wei Bi, Xiang Li, and Ben Kao,

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) 2021

arXiv Code