PTI Hack 2022

Track №3 task solution by Team GARCH with RMSE 20.12592

Task

Competition could be found by this link: https://www.kaggle.com/c/pti-hack

The task was to predict the probability of successful deal closing, having the history of interactions with clients.

Solution

Our solution approach consists of building a Classifier, then using LightGBM and CatBoost separately for successful and unsuccessful cases and stacking them. Since the train and test data had a non-zero intersection, we had to define a correct time-aware prediction scheme (so that we could predict the target for each new element based only on past data).

Reasons for using this method:

Avoid overfitting at the intersection of train and test
Avoid occuring leaks during the generation of new features: preventing situation where the past flows into the future

Feature Engineering

Dates: CreatedDate, CreatedDateForInsert, ValidThroughDate - differences, quarters, years, sin-cos encoding
Lags: Stats of previous probabilities grouped by Opportunity, CreatedBy and periods
Categorical: CreatedById, AccountId, RecordTypeId, Type, LeadSource, CampaignId etc.
CountVectorizer: Needs__c - dividing into more features.

Approach Details

Divide the target by 100 and build the LightGBM model with the logloss objective.
Build a classifier model (target - StageName - forecast of how the deal will end at the very end: 0 - unsuccessfully, 1 - successfully).
For each point in the dataset, predict the value
Divide the dataset into 2 parts: successful and unsuccessful cases.
On each of the parts, build a separate LGBMRegressor and CatBoost to predict the final value of the probability.
Stacking of CatBoost and LightGBM models (with coefficients 0.4 and 0.6, respectively) in each of the categories.

Notebooks

pti_hack.ipynb (nbviewer) - Main notebook used for creating the final stacking model

Because of exhaustive pointwise time-respecting predictions for stacking - the notebook takes approximately 1 hour to run on 16 CPUs / n_jobs=32.

Docker

The same code in .py script and additional files to run within the Docker container

Build

cd PTI-Hack-2022/docker
docker build -t pti_hack .
docker tag pti_hack:latest <your_username>/pti_hack:latest
docker push <your_username>/pti_hack

Environmental variables

KAGGLE_USERNAME - username in Kaggle
KAGGLE_TOKEN - Kaggle API token
N_JOBS - number of jobs for parallel execution

Running on zod.tv (sponsor platform)

curl -H "Content-Type: application/json" \
-H "Authorization: Zod58 {{your_api_key}}" \
-X POST https://offchain.zod.tv/job_new -d @- << EOF
{
    "type": "docker",
    "path": "docker.io/<your_username>/pti_hack",
    "cpu": 32,
    "ram": 64,
    "disk": 30
}
EOF

The prediction will be submitted automatically after execution ends.

Troubleshooting

For some reasons container sometimes fails with the joblib.externals.loky.process_executor.TerminatedWorkerError when running on zod.tv

Consider N_JOBS to be small enough to prevent this (but this can significantly slow down the learning speed).

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
docker		docker
image		image
notebooks		notebooks
.gitignore		.gitignore
FTI_HACK Team GARCH Approach.pdf		FTI_HACK Team GARCH Approach.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTI Hack 2022

Task

Solution

Feature Engineering

Approach Details

Notebooks

Docker

Build

Environmental variables

Running on zod.tv (sponsor platform)

Troubleshooting

About

Releases

Packages

Contributors 3

Languages

nikiandr/PTI-Hack-2022

Folders and files

Latest commit

History

Repository files navigation

PTI Hack 2022

Task

Solution

Feature Engineering

Approach Details

Notebooks

Docker

Build

Environmental variables

Running on zod.tv (sponsor platform)

Troubleshooting

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages