Explainit is a modern, enterprise-ready business intelligence web application that re-uses existing frameworks to manage and serve dashboard features to machine learning project lifecycle.
Explainit allows ML platform teams to:
- Analyze Drift in the existing data stack (Features & Targets).
- Prepare very short summary of productionized data.
- Perform Quality Checks on the data to provide the feature overview.
- Analyze in-depth relationship between features & target.
Explainit helps ML platform teams with DevOps experience monitor productionized batch data. Explainit can also help these teams build towards a explainability/monitoring platform that improves collaboration between engineers and data scientists.
Explainit is likely not the right tool if you:
- Are in an organization that’s just getting started with ML and is not yet sure what the business impact of ML is.
- Rely primarily on unstructured data.
Model Drift (also known as model decay) refers to the degradation of a model’s prediction power due to changes in the environment or changes in feature distribution, and thus the relationships between variables.
There are three main types of model drift:
- Concept drift
- Data drift
- Upstream data changes
Concept drift is a type of model drift where the relationship between the input and target changes over time. It usually occurs when real-world environments change in contrast to the training data the model learned from. For example, the behaviour of customers can change over time, lowering the accuracy of a model trained on historic customer datasets.
Data drift is a type of model drift where the properties of the independent variable(s) change(s). Examples of data drift include changes in the data due to seasonality, changes in consumer preferences, the addition of new products, etc…
Upstream data changes refer to operational data changes in the data pipeline. An example of this is when a feature is no longer being generated, resulting in missing values. Another example is a change in measurement (eg. miles to kilometers).
Install the Explainit Package:
pip install explainit
In order to generate the dashboards inside the application, you need to run the following commands.
from explainit.app import build
After importing the methods, we need some data that should be passed to the application in order to generate the dashboards.
We'll use the Default Loan
dataset.
import pandas as pd
ref_data = pd.read_csv("https://raw.githubusercontent.com/katonic-dev/explainit/master/examples/data/reference_data.csv", index_col=None)
prod_data = pd.read_csv("https://raw.githubusercontent.com/katonic-dev/explainit/master/examples/data/production_data.csv", index_col=None)
Once you have the both reference and production datasets, all you need to do is pass those datasets into the method that we imported along with the target column name and target column type (type should be cat
for categorical column and num
for numerical columns).
build(
reference_data=ref_data,
production_data=prod_data,
target_col_name="bad_loan",
target_col_type="cat",
host="127.0.0.1",
port=8050
)
If you want to run your application in a separate server rather than localhost, you need to mention the host and port addresses.
Below is a snapshot of the landing page of Explainit Dashboard.
Interested in contributing? Check out our CONTRIBUTING.md to find resources around contributing along with a detailed guide on how to set up a development environment.
A. By this app users can calculate Dataset Drift, Target Drift and Data Quality metrics to understand the Production / Real-World Data along with Training / Reference Data better to come to a decision.
A. Input Data is nothing but your reference/training and production/inference data. The reference data will be used for the distribution comparision for the production data. These input data should be passed as pandas dataframes.
A. App shows / produces the Statistical Information about the complete data (features target) for drift analysis, Distribution Plots for each of the features to understand the data better, Contribution of each features on the target along with Correlations metrics.
A. With Drift Information from the app user can make some decisions:
- Look for the quality data for the usecase.
- Make changes or train new models for production.
- Update the domain specific concepts to understand the real-world better for new models.
- for more FAQs visit faq.md.