KGist: Knowledge Graph Summarization for Anomaly Detection & Completion

Caleb Belth, Xinyi Zheng, Jilles Vreeken, and Danai Koutra. What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization. ACM The Web Conference (WWW), April 2020. [Link to the paper]

If used, please cite:

@inproceedings{belth2020normal,
  title={What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization},
  author={Belth, Caleb and Zheng, Xinyi and Vreeken, Jilles and Koutra, Danai},
  booktitle={Proceedings of The Web Conference 2020},
  pages={1115--1126},
  year={2020}
}

Presentation: https://youtu.be/Ql7VEfliPXo

Setup

git clone [email protected]:GemsLab/KGist.git
cd data/
unzip nell.zip
unzip dbpedia.zip
cd ../src/
cd test/
python tester.py

Requirements

Python 3
numpy
scipy
networkx

Data

Nell and DBpedia are zipped in the data/ directory. Yago is too big to distribute via Github.

{KG_name}.txt format: space separated, one triple per line.

s1 p1 o1
s2 p2 o2
...

{KG_name}_labels.txt format: space separated, one entity per line followed by a variable number of labels, also space separated.

e1 l1 l2 ...
e2 l1 l2 l3 ...
...

Example usage (from `src/` dir)

Command Line

python main.py --graph nell

Interface

from graph import Graph
from searcher import Searcher
from model import Model

# load graph
graph = Graph('nell', idify=True)
# create a Searcher object to search for a model (set of rules)
searcher = Searcher(graph)
# build initial model
model = searcher.build_model()
model.print_stats()
# perform rule merging refinement
model = model.merge_rules()
model.print_stats()
# perform rule nesting refinement
model = model.nest_rules()
model.print_stats()

To compute anomaly scores for triples as in Section 4.3:

from anomaly_detector import AnomalyDetector

# construct an anomaly detector with the KGist model
anomaly_detector = AnomalyDetector(model)
# an edge/triple to score
edge = ('concept:company:limited_brands', 'concept:companyceo', 'concept:ceo:leslie_wexner')
anomaly_detector.score_edge(edge)
>>> 26.5164

Larger numbers mean more anomalous. Note that in our experiments in Section 5.2, we used KGist m, which would be the model without running model.nest_rules().

Arguments

--graph {KG_name} Expects {KG_name}.txt and {KG_name}_labels.txt to be in data/ directory in format as described above for NELL and DBpedia.

--rule_merging / -Rm True/False (Optional; Default = False) Use rule merging refinement (Section 4.2.2)

--rule_nesting / -Rn True/False (Optional; Default = False) Use rule nesting refinement (Section 4.2.2)

--idify / -i True/False (Optional; Default = True) Convert entities and predicates to integer ids internally for faster processing

--verbosity / -v [0, infinity) (Optional; Default = 1,000,000) How frequently to log progress (use integers)

--output_path / -o (Optional; Default = 'output/') What directory to write the output to (log will still be printed to stdout)

Output

output/{KG_name}_model.pickle saves a Model object.
output/{KG_name}_model.rules saves the rules, which are recursively defined, in parenthetical form.

Frequently Asked Questions (FAQ)

I want to run KGist on my own dataset. How did you construct the labels file?

We constructed the labels file by moving the rdf:type triples to the labels file. Thus, if, for example, there are triples (LaRose, rdf:type, book) and (LaRose, rdf:type, novel) in the KG, then LaRose book novel would be a row in the labels file.

Comments or Questions

Contact Caleb Belth with comments or questions: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
output		output
src		src
test		test
.gitignore		.gitignore
README.md		README.md
license.txt		license.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KGist: Knowledge Graph Summarization for Anomaly Detection & Completion

Setup

Requirements

Data

Example usage (from `src/` dir)

Command Line

Interface

Arguments

Output

Frequently Asked Questions (FAQ)

I want to run KGist on my own dataset. How did you construct the labels file?

Comments or Questions

About

Releases

Packages

Contributors 2

Languages

License

GemsLab/KGist

Folders and files

Latest commit

History

Repository files navigation

KGist: Knowledge Graph Summarization for Anomaly Detection & Completion

Setup

Requirements

Data

Example usage (from src/ dir)

Command Line

Interface

Arguments

Output

Frequently Asked Questions (FAQ)

I want to run KGist on my own dataset. How did you construct the labels file?

Comments or Questions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Example usage (from `src/` dir)

Packages