未来金融创新工程中心

以前沿研究和创新工程,探索数字时代金融发展的新范式。

最新工作论文 (Latest Working Papers)

2026-03

中国A 股上市公司行业分类数据集构建——基于大语言模型的方法

作者: 吴轲、应镇焜、钱宗鑫、周德馨

摘要: 行业分类是金融经济学实证研究的基础性工具,但现有中国A 股市场多套行业分类标准普遍存在更新滞后、区分度不足等问题。本文基于2007 至2023 年52702 份A 股上市公司年报“管理层讨论与分析”(MD&A)文本,利用大语言模型的文本嵌入能力与层次聚合聚类算法,构建了一套涵盖26 个一级、102 个二级和271 个三级行业的中国A 股上市公司行业分类数据集。实证结果显示,该分类体系在行业间差异性和行业内相似性两个维度上均显著优于中上协、申万和万得等主流分类标准,表明其能够更有效地实现“类内相似、类间差异”的分类目标。拓展性分析表明,基于LLM 分类构造的领先-滞后对冲投资组合能够产生统计显著的月度平均收益,并且在Fama-French 五因子和中国四因子模型调整后仍然显著;Fama-MacBeth 回归进一步证实,LLM 分类在捕获高价股同行业动量效应方面具有最强的预测能力,为该分类体系的准确性提供了基于资产定价的证据。本文为中国A 股市场提供了一套分类精准、数据驱动、可动态更新的行业分类框架,为公司金融、资产定价等实证研究提供新的分析工具。

2026-04

A New Approach to Connecting the Dividend-Price Ratio and Stock Returns

作者: Wenting Liao

摘要: There is a long-lasting debate on the performance of dividend-price ratio on the stock returns predictability. Most of the literature argues that the predictability decreases after the 1990s. Since Campbell-Shiller decomposition shows that the dividendprice ratio contains the information of both the future returns and future dividend growth, a linear predictive regression of stock returns on the dividend-price ratio may generate biased results due to the measurement error or omitted variables issues. Therefore, this paper proposes a new approach to study the nonlinear Granger causality of dividend-price ratio on stock returns. We conduct an unobserved component model and connect stock returns and the dividend-price ratio through their innovations. We show that our model can be represented by a reduced-form ARMAX process, and it can increase the in-sample predictability for all the sample periods.

2026-03

One-Shot Traversal for Low-Latency SSD-based Tree Index

作者: Yiheng Tong, Minhui Xie, Yuanhui Luo, Jing Liu, Ran Shu, Yongqiang Xiong, Yunpeng Chai

摘要: SSD-based tree indexes (e.g., B+tree) have been widely adoptted in storage and database systems. Driven by modern high-performance SSDs, much prior work has focused on improving their overall throughput. However, when it comes to latency, such approaches still suffer from the inherent bottleneck of pointer chasing during index traversals, so query latency does not benefit directly from faster SSDs. In this paper, we propose Shortcut, a lightweight solution that transforms the conventional wisdom of inherent access dependency into one-shot traversal, thereby achieving low query latency.
Our key idea is to exploit intra-query parallelism by training per-level learned indexes as B+tree companion to predict the traversal path in advance, and issuing multiple concurrent I/O requests to prefetch all target nodes in a one-shot manner. The main challenge we deal with is the excessive memory overhead of the additional learned index, which originates not from the machine learning (ML) models, but from two essential structures in it: the key array (for last-mile search) and the value array (termed “mapping array”, for mapping model predictions to arbitrary node locations in the file). Shortcut eliminates two arrays through two techniques: Keyless Learning Method and Phantom Mapping Mechanism, and achieves nearly-zero memory overhead (∼0.6%). Evaluations on YCSB and TPC-C show that Shortcut can reduce end-toend query latency by 26.2% to 64.8%.