provide fast access and retrieval support for cross-sectional data #1835

imhsz · 2024-07-28T09:26:08Z

🌟 Feature Description

Application scenario
Regarding the distribution and sorting of factors such as volume and price of the top 100 gains in different sectors/themes every day recently, predict the premium and gain sorting of the top 10 for today's next day
Related works (Papers, Github repos etc.):
Any other relevant and important information:
The current disk storage is for each stock as the first dimension. When encountering the above scenarios, in order to support extracting the daily sorting, theoretically, I need to load all the data within the specified time window of the entire market into the memory, and at the same time perform quantitative normalization for all the involved information; while only the top 100 of each day are actually involved in training or reasoning. When the training time span is longer, a large amount of memory is also required for support. It is suggested that there be a more efficient data organization method for this kind of scenario, and support the automatic data normalization in the cross-sectional form.
当前磁盘存储为每个股票第一维度,当遇到以上场景,为了支持去取出每日排序,理论上我需要加载全市场指定时间窗口内的所有数据到内存,同时针对所有涉及信息做量化归一;而真正参与训练或推理的,仅仅是每天的前100.当训练时间跨度较长时,也需要大量的内存才能支撑.建议针对这种场景有更高效的数据组织方式,并且支持截面形态的自动的数据归一.

imhsz added the enhancement New feature or request label Jul 28, 2024