Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient update checking #5

Open
periish opened this issue Jan 3, 2022 · 13 comments
Open

More efficient update checking #5

periish opened this issue Jan 3, 2022 · 13 comments

Comments

@periish
Copy link

periish commented Jan 3, 2022

Linux (and macOS) both support an event based API for file watching - perhaps this'd be more efficient than manually checking for updates?
https://crates.io/crates/inotify
https://crates.io/crates/kqueue
are existing bindings.

@bjorn3
Copy link

bjorn3 commented Jan 3, 2022

notify is a platform agnostic file watcher crate.

@tkellogg
Copy link
Owner

tkellogg commented Jan 3, 2022

Oh wow the notify crate is perfect. Any idea how many files it can efficiently watch? I could see it getting up into the tens of thousands pretty easily.

@bjorn3
Copy link

bjorn3 commented Jan 3, 2022

The default inotify limit (amount of watched directories I think) on linux seems to be 8192. This limit is shared among all program (including your editor). You can increase it by writing to /proc/sys/fs/inotify/max_user_watches as root, but each watched directory uses 1080 bytes of kernel memory, so there is a hard limit on how much directories you can watch depending on how much ram you have.

An alternative would be to implement a language server using the lsp protocol that only captures which files are changed according to the editor. This would also allow saving each individual keystroke even when the user doesn't explicitly save. In addition the editor extension could then be responsible for ensuring the daemon is running.

@jauntywunderkind
Copy link

jauntywunderkind commented Jan 3, 2022

Remarkably overkill solution, but for jauntywunderkind/git-auto-commit, I use facebook/watchman. It's extremely well tuned, and let's me add filters.

You can either just exec watchman command line (which runs either runs standalone or via by spawning a server- ideal if there's a lot of different watchers!), or you can talk to a server via it's socket interface (socket with json or bser encoding). There's also watchman_client to facilitate using that socket interface.

@tkellogg
Copy link
Owner

tkellogg commented Jan 3, 2022

I'd rather not add external dependencies, if possible. I don't mind crates.

Another idea — the "tens of thousands" is naive. You could probably narrow it down to ~100 files with 95% confidence. Some ideas for heuristics:

  • A bash/zsh prompt function could inform which directories to look at (e.g. only look in repos that have been navigated to in last 24 hours). It could work, but it only works for terminal users (a la How to ensure the daemon starts, stays running, and watches the right Git repos #3)
  • A scanner thread that looks at timestamps. If it sees a changed file, it watches all files in that repo. Maintain it LRU style. The scanner thread could be a lot less infrequent — even 20 minutes could be okay.

@neinseg
Copy link

neinseg commented Jan 3, 2022

Another heuristic would be to inotify-watch files that are currently open (as determined through /proc/$pid/fd), as well as directories that are the working directory of a currently running process (as given by /proc/$pid/cwd IIRC). That, plus a regular (every few minutes) scan. That full scan could be done slowly in the background instead of in batches to avoid causing load spikes.

@tkellogg
Copy link
Owner

tkellogg commented Jan 4, 2022

@neinseg I like that, but how long do files stay open? Does Vim or VSCode actually hold the file open? Seems like "opened files" is too ephemeral to work well, but I don't know. If you could watch all file descriptors under /proc/*/fd, then this would be an amazing solution. That or process an event log.

@alin23
Copy link
Contributor

alin23 commented Jan 5, 2022

I'm using a shell implementation of this feature using fswatch (cross-platform file monitor) and dura capture.

Note: this replaces the need for dura serve & as fswatch will be the daemon instead

Fish shell implementation

set repos (cat ~/.config/dura/config.json | jq -rc '.repos | keys | join("§")' 2>/dev/null)
set pollingSeconds 10

fswatch -e .git -0 -l $pollingSeconds -r (string split '§' -- $repos) | while read -l -z path
    cd $path 2>/dev/null || cd (dirname $path) && cd (git rev-parse --show-toplevel) && dura capture
end

Bash/Zsh shell implementation

repos=$(cat ~/.config/dura/config.json | jq -rc $'.repos | keys | map("\'\(.)\'") | join(" ")' 2>/dev/null)
pollingSeconds=10

eval "fswatch -e .git -0 -l $pollingSeconds -r $repos" | while read -r -d '' path
do
    cd $path 2>/dev/null || cd $(dirname $path) && cd $(git rev-parse --show-toplevel) && dura capture
done

How it works?

  1. Get the repos list from the dura config.json file (this means you can still dura watch repos as usual)
  2. Join the list of repo paths using a rarely used character §
  3. Watch for changes in all repos: fswatch -r
    1. -e .git: excluding changes to the .git folder
    2. -0: outputs changed paths delimited by the NUL character (or \0)
    3. -l $pollingSeconds: just like a debounce function, calls dura capture x seconds after the last event occured on a file to avoid too many commits when doing lots of consecutive changes
  4. cd into the changed repo and call dura capture

@tkellogg
Copy link
Owner

tkellogg commented Jan 5, 2022

@alin23 can you send a PR to update the README? this is amazing and i don't want to lose it in the issues

@tkellogg
Copy link
Owner

tkellogg commented Jan 5, 2022

thinking about this... @alin23 maybe we should start adding script files into the core repo for stuff like this.

@alin23
Copy link
Contributor

alin23 commented Jan 5, 2022

Yes, script files would be better. That way you could have a command like dura install --fish to copy the scripts and make them run at startup or something like that

@tkellogg
Copy link
Owner

tkellogg commented Jan 5, 2022

I love it! Let's do it

@alin23
Copy link
Contributor

alin23 commented Jan 5, 2022

#36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants