Software developer, researcher and consultant with a PhD in Computer Science, Web Data Engineer at Internet Archive, working on better access to web archives.
-
Internet Archive
- Hannover, Germany
- http://www.HelgeHolzmann.de
Popular repositories Loading
-
ArchiveSpark
ArchiveSpark PublicAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
-
internetarchive-transfer-scripts
internetarchive-transfer-scripts PublicScripts to transfer archive.org collections, using https://github.com/jjjake/internetarchive
-
HadoopConcatGz
HadoopConcatGz PublicA Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
-
HadoopWebGraph
HadoopWebGraph PublicA Hadoop input format to use gaphs in WebGraph's BV format with Hadoop and Spark.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.