☃️ ColdRec is a comprehensive open-source toolkit and benchmark for cold-start recommendation. In coldrec, models follow a unified pipeline, the datasets follow a unified division, and tasks include cold user/item recommendation, warm user/item recommendation, and overall user/item recommendation, targeting at providing the community with a comprehensive and fair benchmark evaluation for cold-start recommendation.
🔧 Information in 2024.11: Thanks for the usage! There are still many problems that can be improved in this codebase, such as the function or unclear descriptions mentioned in the issues. Due to recent busyness, sorry that I am unable to reply carefully to each message one by one. I would like to improve this codebase based on the feedback on the issues in February or March 2025.
🥳 Update in 2024.06: Add automatic hyper-parameter tuning, you can install one more base library optuna to include this module.
ColdRec hopes to avoid the complicated and tedious packaging process and uses native pytorch and a small number of necessary libraries to build the codebase.
python >= 3.8.0
torch >= 1.11.0
faiss-gpu >= 1.7.3
pandas >= 2.0.3
numba >= 0.58.1
numpy >= 1.24.4
scikit-learn >= 1.3.2
pickle >= 0.7.5
optuna >= 3.6.1 (If you need automatic hyper-parameter tuning)
1️⃣ Dataset Preprocess
We have provided the preprocessed datasets in Google Drive. You can download the dataset from the Google Drive link directly, unzip it, and place it into the ./data folder. Then, you can process it into the format for model training with simple two scripts, of which the details can be found in the page of dataset details.
2️⃣ Warm Embedding Pre-Training
This step is not necessary, but since most of the work requires this step, it is recommended to complete this step to fully evaluate all models. We have provided both widely adopted collaborative filtering modle MF and graph-based model LightGCN as warm recommender. You can obtain the pre-trained warm user/item embeddings with the following two options:
Options 1: You can directly access the BPR-MF pre-trained embeddings at the Google Drive. Then, the embedding folder associated to each dataset should be placed in the ./emb folder.
Options 2: You can also pre-train the warm embedding by yourself. Specifically, you can simply obtain the pre-trained warm embeddings by running the following script:
python main.py --dataset [DATASET NAME] --model [WARM RECOMMENDER] --cold_object [user/item]
In the above script, the [DATASET NAME] for --dataset should be replaced by your target dataset name, such as movielens. Then, the [WARM RECOMMENDER] for --model should be selected as the warm recommender type (MF, LightGCN, NCL, SimGCL, and XSimGCL). Finally, the [user/item] for --cold_object should be selected as user or item for the user cold-start setting or item cold-start setting.
3️⃣ Cold-Start Model Training and Evaluation
Coming to this step, you can start to train the cold-start model with one script:
python main.py --dataset [DATASET NAME] --model [MODEL NAME] --cold_object [user/item]
In the above script, the [MODEL NAME] for --model is the expected model name, where we have provided 20 representative models as the Supported Models. You can also flexibly register your own model with the ColdRec framework for evaluation.
4️⃣ (Option) Automatic Hyper-parameter Tuning
ColdRec also supports automatic hyper-parameter tuning. You can tune hyper-parameters with optuna with one script:
python param_search.py --dataset [DATASET NAME] --model [MODEL NAME] --cold_object [user/item]
You can flexibly set the tuning range in param_search.py.