toy-graph
is a purely c CPU based multithreading graph computing system
aiming to support billions vertexes. It's originally developed for computing
pagerank for infomall web pages dataset. Thanks to
Dr. Ma's guidance.
use pagerank as an example, for api usage, see pagerank example.
- install bazel.
toy-graph
is built using bazel,
follow the installation guide to install it.
- compile pagerank program.
bazel build //example/pagerank:main
- prepare graph input file
raw input file is test/pagerank_test_graph
, each line is an edge, format is
src dst src_degree
(pagerank task specific). we need to convert this file to
binary first, we use 4 bytes to represent each variable(12 bytes each edge). run
bazel build //tool:convert_to_binary
to compile convert tool. In test
directory, run
../bazel-bin/tool/convert_to_binary --input pagerank_test_graph
--output pagerank_test_graph.bin
then we get .bin
binary graph input file.
- run pagerank example
in toy-graph
project directory, run
./bazel-bin/example/pagerank/main --input
test/pagerank_test_graph.bin --output test/pagerank_test_graph.pagerank
--vertexNum 5 --edgeNum 5 --iterations 1
finally we get
test/pagerank_test_graph.pagerank
result file, each line is
vertex_index pagerank_value
format.
Currently toy-graph
supports single machine & vertex in memory & pull
computation, one thread to read, multi threads to compute, memory cost is
each_vertex_cost * num_of_vertex. There is no disk space needed except result
file.
-
we use valgrind to do memory check, such as memory leak, we use gtest to do unit test).
-
In order to reduce memory cost, all vertexes are stored on a continuous memory buffer instead of a vector of objects(aligned memory can increase memory cost). We leave each vertex memory management to user themselves.
-
to be continued(such as comparative experiments).
- speed optimization, unit tests and experiment results.
- support push computation model.
- add code comments, add docs.
- support multi platforms.
- support distributed computation.