Wayback is a tool that supports running as a command-line tool and docker container, purpose to snapshot webpage to time capsules.
Supported Golang version: See .github/workflows/testing.yml
- Cross platform
- Batch wayback URLs
- Builtin CLI (
wayback
) - Serve as Tor Hidden Service or local web entry
- Wayback to Internet Archive, archive.today, IPFS and Telegraph easier
- Interactive with IRC, Martix, Telegram bot, Discord bot, Mastodon and Twitter as daemon service
- Supports publish wayback results to Telegram channel, Mastodon and GitHub Issues
- Supports store archived files to disk
- Download stream media (requires FFmpeg)
The simplest, cross-platform way is to download from GitHub Releases and place the executable file in your PATH.
From source:
go get -u github.com/wabarc/wayback/cmd/wayback
From GoBinaries:
curl -sf https://gobinaries.com/wabarc/wayback/cmd/wayback | sh
Using Snapcraft (on GNU/Linux)
sudo snap install wayback
Via APT:
curl -s https://apt.wabarc.eu.org/KEY.gpg | sudo apt-key add -
sudo echo "deb https://apt.wabarc.eu.org/ /" > /etc/apt/sources.list.d/wayback.list
sudo apt update
sudo apt install wayback
Via RPM:
sudo cat > /etc/yum.repos.d/wayback.repo<< EOF
[wayback]
name=Wayback Repository
baseurl=https://rpm.wabarc.eu.org/x86_64/
enabled=1
gpgcheck=0
EOF
sudo yum install -y wayback
Via Homebrew:
brew tap wabarc/wayback
brew install wayback
$ wayback -h
A command-line tool and daemon service for archiving webpages.
Usage:
wayback [flags]
Examples:
wayback https://www.wikipedia.org
wayback https://www.fsf.org https://www.eff.org
wayback --ia https://www.fsf.org
wayback --ia --is -d telegram -t your-telegram-bot-token
WAYBACK_SLOT=pinata WAYBACK_APIKEY=YOUR-PINATA-APIKEY \
WAYBACK_SECRET=YOUR-PINATA-SECRET wayback --ip https://www.fsf.org
Flags:
--chatid string Telegram channel id
-c, --config string Configuration file path, defaults: ./wayback.conf, ~/wayback.conf, /etc/wayback.conf
-d, --daemon strings Run as daemon service, supported services are telegram, web, mastodon, twitter, discord, slack, irc
--debug Enable debug mode (default mode is false)
-h, --help help for wayback
--ia Wayback webpages to Internet Archive
--info Show application information
--ip Wayback webpages to IPFS
--ipfs-host string IPFS daemon host, do not require, unless enable ipfs (default "127.0.0.1")
-m, --ipfs-mode string IPFS mode (default "pinner")
-p, --ipfs-port uint IPFS daemon port (default 5001)
--is Wayback webpages to Archive Today
--ph Wayback webpages to Telegraph
--print Show application configurations
-t, --token string Telegram Bot API Token
--tor Snapshot webpage via Tor anonymity network
--tor-key string The private key for Tor Hidden Service
-v, --version version for wayback
Wayback one or more url to Internet Archive and archive.today:
wayback https://www.wikipedia.org
wayback https://www.fsf.org https://www.eff.org
Wayback url to Internet Archive or archive.today or IPFS:
// Internet Archive
$ wayback --ia https://www.fsf.org
// archive.today
$ wayback --is https://www.fsf.org
// IPFS
$ wayback --ip https://www.fsf.org
For using IPFS, also can specify a pinning service:
$ export WAYBACK_SLOT=pinata
$ export WAYBACK_APIKEY=YOUR-PINATA-APIKEY
$ export WAYBACK_SECRET=YOUR-PINATA-SECRET
$ wayback --ip https://www.fsf.org
// or
$ WAYBACK_SLOT=pinata WAYBACK_APIKEY=YOUR-PINATA-APIKEY \
$ WAYBACK_SECRET=YOUR-PINATA-SECRET wayback --ip https://www.fsf.org
More details about pinning service.
With telegram bot:
wayback --ia --is --ip -d telegram -t your-telegram-bot-token
Publish message to your Telegram channel at the same time:
wayback --ia --is --ip -d telegram -t your-telegram-bot-token --chatid your-telegram-channel-name
Also can run with debug mode:
wayback -d telegram -t YOUR-BOT-TOKEN --debug
Both serve on Telegram and Tor hidden service:
wayback -d telegram -t YOUT-BOT-TOKEN -d web
By default, wayback
looks for configuration options from this files, the following are parsed:
./wayback.conf
~/wayback.conf
/etc/wayback.conf
Use the -c
/ --config
option to specify the build definition file to use.
You can also specify configuration options either via command flags or via environment variables, an overview of all options below.
Flags | Environment Variable | Default | Description |
---|---|---|---|
--debug |
DEBUG |
false |
Enable debug mode, override LOG_LEVEL |
-c , --config |
- | - | Configuration file path, defaults: ./wayback.conf , ~/wayback.conf , /etc/wayback.conf |
- | LOG_TIME |
true |
Display the date and time in log messages |
- | LOG_LEVEL |
info |
Log level, supported level are debug , info , warn , error , fatal , defaults to info |
- | ENABLE_METRICS |
false |
Enable metrics collector |
- | HTTP_LISTEN_ADDR |
127.0.0.1:8964 |
The listen address for the HTTP server |
- | CHROME_REMOTE_ADDR |
- | Chrome/Chromium remote debugging address, for screenshot |
- | WAYBACK_POOLING_SIZE |
3 |
Number of worker pool for wayback at once |
- | WAYBACK_BOLT_PATH |
./wayback.db |
File path of bolt database |
- | WAYBACK_STORAGE_DIR |
- | Directory to store binary file, e.g. PDF, html file |
- | WAYBACK_MAX_MEDIA_SIZE |
512MB |
Max size to limit download stream media |
- | WAYBACK_TIMEOUT |
300 |
Timeout for single wayback request, defaults to 300 second |
- | WAYBACK_USERAGENT |
WaybackArchiver/1.0 |
User-Agent for a wayback request |
- | WAYBACK_FALLBACK |
off |
Use Google cache as a fallback if the original webpage is unavailable |
-d , --daemon |
- | - | Run as daemon service, e.g. telegram , web , mastodon , twitter , discord |
--ia |
WAYBACK_ENABLE_IA |
true |
Wayback webpages to Internet Archive |
--is |
WAYBACK_ENABLE_IS |
true |
Wayback webpages to Archive Today |
--ip |
WAYBACK_ENABLE_IP |
false |
Wayback webpages to IPFS |
--ph |
WAYBACK_ENABLE_PH |
false |
Wayback webpages to Telegra.ph, required Chrome/Chromium |
--ipfs-host |
WAYBACK_IPFS_HOST |
127.0.0.1 |
IPFS daemon service host |
-p , --ipfs-port |
WAYBACK_IPFS_PORT |
5001 |
IPFS daemon service port |
-m , --ipfs-mode |
WAYBACK_IPFS_MODE |
pinner |
IPFS mode for preserve webpage, e.g. daemon , pinner |
- | WAYBACK_GITHUB_TOKEN |
- | GitHub Personal Access Token, required the repo scope |
- | WAYBACK_GITHUB_OWNER |
- | GitHub account name |
- | WAYBACK_GITHUB_REPO |
- | GitHub repository to publish results |
-t , --token |
WAYBACK_TELEGRAM_TOKEN |
- | Telegram Bot API Token |
--chatid |
WAYBACK_TELEGRAM_CHANNEL |
- | The Telegram public/private channel id to publish archive result |
- | WAYBACK_TELEGRAM_HELPTEXT |
- | The help text for Telegram command |
- | WAYBACK_MASTODON_SERVER |
- | Domain of Mastodon instance |
- | WAYBACK_MASTODON_KEY |
- | The client key of your Mastodon application |
- | WAYBACK_MASTODON_SECRET |
- | The client secret of your Mastodon application |
- | WAYBACK_MASTODON_TOKEN |
- | The access token of your Mastodon application |
- | WAYBACK_TWITTER_CONSUMER_KEY |
- | The customer key of your Twitter application |
- | WAYBACK_TWITTER_CONSUMER_SECRET |
- | The customer secret of your Twitter application |
- | WAYBACK_TWITTER_ACCESS_TOKEN |
- | The access token of your Twitter application |
- | WAYBACK_TWITTER_ACCESS_SECRET |
- | The access secret of your Twitter application |
- | WAYBACK_IRC_NICK |
- | IRC nick |
- | WAYBACK_IRC_PASSWORD |
- | IRC password |
- | WAYBACK_IRC_CHANNEL |
- | IRC channel |
- | WAYBACK_IRC_SERVER |
irc.libera.chat:6697 |
IRC server, required TLS |
- | WAYBACK_MATRIX_HOMESERVER |
https://matrix.org |
Matrix homeserver |
- | WAYBACK_MATRIX_USERID |
- | Matrix unique user ID, format: @foo:example.com |
- | WAYBACK_MATRIX_ROOMID |
- | Matrix internal room ID, format: !bar:example.com |
- | WAYBACK_MATRIX_PASSWORD |
- | Matrix password |
- | WAYBACK_DISCORD_BOT_TOKEN |
- | Discord bot authorization token |
- | WAYBACK_DISCORD_CHANNEL |
- | Discord channel ID, find channel ID |
- | WAYBACK_DISCORD_HELPTEXT |
- | The help text for Discord command |
- | WAYBACK_SLACK_APP_TOKEN |
- | App-Level Token of Slack app |
- | WAYBACK_SLACK_BOT_TOKEN |
- | Bot User OAuth Token for Slack workspace, use User OAuth Token if requires create external link |
- | WAYBACK_SLACK_CHANNEL |
- | Channel ID of Slack channel |
- | WAYBACK_SLACK_HELPTEXT |
- | The help text for Slack slash command |
--tor |
WAYBACK_USE_TOR |
false |
Snapshot webpage via Tor anonymity network |
--tor-key |
WAYBACK_TOR_PRIVKEY |
- | The private key for Tor Hidden Service |
- | WAYBACK_TOR_LOCAL_PORT |
8964 |
Local port for Tor Hidden Service, also support for a reverse proxy |
- | WAYBACK_TOR_REMOTE_PORTS |
80 |
Remote ports for Tor Hidden Service, e.g. WAYBACK_TOR_REMOTE_PORTS=80,81 |
- | WAYBACK_TORRC |
/etc/tor/torrc |
Using torrc for Tor Hidden Service |
- | WAYBACK_SLOT |
- | Pinning service for IPFS mode of pinner, see ipfs-pinner |
- | WAYBACK_APIKEY |
- | API key for pinning service |
- | WAYBACK_SECRET |
- | API secret for pinning service |
If both of the definition file and environment variables are specified, they are all will be read and apply, and preferred from the environment variable for the same item.
Prints the resulting options of the targets with --print
, in a Go struct with type, without running the wayback
.
docker pull wabarc/wayback
docker run -d wabarc/wayback wayback -d telegram -t YOUR-BOT-TOKEN # without telegram channel
docker run -d wabarc/wayback wayback -d telegram -t YOUR-BOT-TOKEN -c YOUR-CHANNEL-USERNAME # with telegram channel
Archive.org and Archive.today are currently supported, the next step mind support the followings platform:
Bot friendly instance:
Q: How to keep the Tor hidden service hostname?
A: For the first time to run the wayback
service, keep the key from the output message (the key is the part after private key:
below)
and next time to run the wayback
service to place the key to the --tor-key
option or the WAYBACK_TOR_PRIVKEY
environment variable.
[INFO] Web: important to keep the private key: d005473a611d2b23e54d6446dfe209cb6c52ddd698818d1233b1d750f790445fcfb5ece556fe5ee3b4724ac6bea7431898ee788c6011febba7f779c85845ae87
We encourage all contributions to this repository! Open an issue! Or open a Pull Request!
If you're interested in contributing to wayback
itself, read our contributing guide to get started.
Note: All interaction here should conform to the Code of Conduct.
This software is released under the terms of the GNU General Public License v3.0. See the LICENSE file for details.