中国人民大学未来金融创新工程中心 (CEIF)

2026-03

中国A 股上市公司行业分类数据集构建——基于大语言模型的方法

作者: 吴轲、应镇焜、钱宗鑫、周德馨

摘要: 行业分类是金融经济学实证研究的基础性工具，但现有中国A 股市场多套行业分类标准普遍存在更新滞后、区分度不足等问题。本文基于2007 至2023 年52702 份A 股上市公司年报“管理层讨论与分析”（MD&A）文本，利用大语言模型的文本嵌入能力与层次聚合聚类算法，构建了一套涵盖26 个一级、102 个二级和271 个三级行业的中国A 股上市公司行业分类数据集。实证结果显示，该分类体系在行业间差异性和行业内相似性两个维度上均显著优于中上协、申万和万得等主流分类标准，表明其能够更有效地实现“类内相似、类间差异”的分类目标。拓展性分析表明，基于LLM 分类构造的领先-滞后对冲投资组合能够产生统计显著的月度平均收益，并且在Fama-French 五因子和中国四因子模型调整后仍然显著；Fama-MacBeth 回归进一步证实，LLM 分类在捕获高价股同行业动量效应方面具有最强的预测能力，为该分类体系的准确性提供了基于资产定价的证据。本文为中国A 股市场提供了一套分类精准、数据驱动、可动态更新的行业分类框架，为公司金融、资产定价等实证研究提供新的分析工具。

2026-05

“人大-新华”中国A股上市公司行业分类数据集

作者: 吴轲、应镇焜、钱宗鑫、周德馨

摘要: “人大-新华”A股上市公司行业分类数据集已于2026年3月正式发布并在新华财经数据终端上线，本数据集仅供学术研究及非商业用途使用。任何使用本数据集的研究成果（包括但不限于学术论文、研究报告、工作论文等），均需在参考文献中引用本文。

2026-04

A New Approach to Connecting the Dividend-Price Ratio and Stock Returns

作者: Wenting Liao

摘要: There is a long-lasting debate on the performance of dividend-price ratio on the stock returns predictability. Most of the literature argues that the predictability decreases after the 1990s. Since Campbell-Shiller decomposition shows that the dividendprice ratio contains the information of both the future returns and future dividend growth, a linear predictive regression of stock returns on the dividend-price ratio may generate biased results due to the measurement error or omitted variables issues. Therefore, this paper proposes a new approach to study the nonlinear Granger causality of dividend-price ratio on stock returns. We conduct an unobserved component model and connect stock returns and the dividend-price ratio through their innovations. We show that our model can be represented by a reduced-form ARMAX process, and it can increase the in-sample predictability for all the sample periods.

未来金融创新工程中心

最新工作论文 (Latest Working Papers)

中国A 股上市公司行业分类数据集构建——基于大语言模型的方法

“人大-新华”中国A股上市公司行业分类数据集

A New Approach to Connecting the Dividend-Price Ratio and Stock Returns

新闻与活动

核心产品