Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Point-in-time Data Operation #343

Merged
merged 34 commits into from
Mar 10, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift click to select a range
3e92169
add period ops class
bxdd Mar 9, 2021
fead243
black format
bxdd Mar 9, 2021
61720c2
add pit data read
bxdd Mar 10, 2021
a0959a9
fix bug in period ops
bxdd Mar 10, 2021
bd46d14
update ops runnable
bxdd Mar 10, 2021
9f1cc64
update PIT test example
bxdd Mar 10, 2021
63e4895
black format
bxdd Mar 10, 2021
88b7926
update PIT test
bxdd Mar 10, 2021
a2dae5c
update tets_PIT
bxdd Mar 12, 2021
99db80d
update code format
bxdd Mar 12, 2021
c4bbe6b
add check_feature_exist
bxdd Mar 12, 2021
20bcf25
black format
bxdd Mar 12, 2021
6e23ff7
optimize the PIT Algorithm
bxdd Mar 12, 2021
88a0d3d
fix bug
bxdd Mar 12, 2021
f52462a
update example
bxdd Mar 12, 2021
b794e65
update test_PIT name
bxdd Mar 17, 2021
255ed0b
Merge https://github.com/microsoft/qlib
bxdd Apr 7, 2021
9df1fbd
add pit collector
bxdd Apr 7, 2021
71d5640
black format
bxdd Apr 7, 2021
ebe277b
fix bugs
bxdd Apr 8, 2021
655ff51
fix try
bxdd Apr 8, 2021
f6ca4d2
fix bug & add dump_pit.py
bxdd Apr 9, 2021
566a8f9
Successfully run and understand PIT
you-n-g Mar 4, 2022
63b5ed4
Merge remote-tracking branch 'origin/main' into PIT
you-n-g Mar 4, 2022
4997389
Add some docs and remove a bug
you-n-g Mar 4, 2022
561be64
Merge remote-tracking branch 'origin/main' into PIT
you-n-g Mar 8, 2022
6811a07
Merge remote-tracking branch 'origin/main' into PIT
you-n-g Mar 8, 2022
cf77cd0
mv crypto collector
you-n-g Mar 8, 2022
79422a1
black format
you-n-g Mar 8, 2022
48ea2c5
Run succesfully after merging master
you-n-g Mar 8, 2022
9c67303
Pass test and fix code
you-n-g Mar 10, 2022
69cf2ab
remove useless PIT code
you-n-g Mar 10, 2022
de8d6cb
fix PYlint
you-n-g Mar 10, 2022
2671dc2
Rename
you-n-g Mar 10, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add some docs and remove a bug
  • Loading branch information
you-n-g committed Mar 4, 2022
commit 499738938190406d71b7807915f99e3b8fe03bee
26 changes: 25 additions & 1 deletion docs/advanced/PIT.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 6,30 @@
.. currentmodule:: qlib


Introduction
------------
Point-in-time data is a very important consideration when performing any sort of historical market analysis.

For example, let’s say we are backtesting a trading strategy and we are using the past five years of historical data as our input.
Our model is assumed to trade once a day, at the market close, and we’ll say we are calculating the trading signal for 1 January 2020 in our backtest. At that point, we should only have data for 1 January 2020, 31 December 2019, 30 December 2019 etc.

In financial data (especially financial reports), the same piece of data may be amended for multiple times overtime. If we only use the latest version for historical backtesting, data leakage will happen.
Point-in-time database is designed for solving this problem to make sure user get the right version of data at any historical timestamp. It will keep the performance of online trading and historical backtesting the same.



Data Preparation
----------------

Qlib provides a crawler to help users to download financial data and then a converter to dump the data in Qlib format.
Please follow `scripts/data_collector/pit/README.md` to download and convert data.


File-based design for PIT data
------------------------------

Qlib provides a file-based storage for PIT data.

For each feature, it contains 4 columns, i.e. date, period, value, _next.
Each row corresponds to a statement.

Expand All @@ -17,7 41,7 @@ The meaning of each feature with filename like `XXX_a.data`
- `value`: the described value
- `_next`: the byte index of the next occurance of the field.

Besides the feature, a index `XXX_a.index`
Besides the feature data, an index `XXX_a.index` is included to speed up the querying performance

The statements are soted by the `date` in ascending order from the beginning of the file.

Expand Down
1 change: 0 additions & 1 deletion qlib/data/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 62,6 @@ def backend_obj(self, **kwargs):
backend = copy.deepcopy(backend)
backend.setdefault("kwargs", {}).update(**kwargs)
return init_instance_by_config(backend)
>>>>>>> origin/main


class CalendarProvider(abc.ABC):
Expand Down