Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few optimizations #280

Merged
merged 7 commits into from
Aug 16, 2022
Merged

A few optimizations #280

merged 7 commits into from
Aug 16, 2022

Conversation

bconstanzo
Copy link
Contributor

@bconstanzo bconstanzo commented Aug 16, 2022

I've been profiling and testing a few changes that have a major impact on the Windows performance of ALEAPP.

The reported time goes down by about 3x on my benchmarks. The changes don't affect behavior as far as I could test, the results are consistent between runs, it just goes faster.

Mainly this is achived by placing a few caches in place (functools.lru_cache with maxsize set to None), and avoiding duplicating work (fnmatch.fnmatch usage is changed by a "deconstructed" version of it). Tried to keep the code as simple and readable as it could be, while netting some nice speedups.

There's also a small bit that covers specifically snappy decompression, where I managed to improve the algorithm just a bit. Enough to get a 4-5% performance increase, without resorting to overly complicated code.

Methodology:

  • Ran ALEAPP against Josh Hickman's Android 12 test image.
  • Checked reported run time.
  • Profiled everything and analyzed the call graphs and functions (with cProfile, gprof2dot, and the amazing line_profiler).
  • Got millions of files and hundreds of gigabytes of storage space used up on my disk.
  • Found out a few spots where things could be improved.
  • Rinse and repeat.

This substantially speeds up the program under Windows (about twice as fast) without any changes to the behavior and results.
…epeteadly

This brings a ~40% speedup* on top of the previous commit.

* note I'm basing this timings no cProfile runs, but it holds quite nicely on "normal" runs
This changes how things are processed and speeds things up a bit, however now shutil.copy2() is taking a significant % of time of this function, probably due to some of the newer artifacts
Helps another ~10% or so with running times (because we're normcasing every filepath over and over again for every artifact)
makes this function about twice as fast

replaced a for that wrote one byte at a time with a bit write plus a condition for the (rare) case where you're writing past the end and you have to repeat part of the written-out output
I checked fnmatch.filter() code on the standard library, and then went with the same style that is used in the other seekers. Basically fnmatch.filter() does the same as the other seekers were doing already, so it's the same (haven't tested it though)
@jijames
Copy link
Contributor

jijames commented Aug 16, 2022

@bconstanzo do you have the tests you used for this? Just for reference.
@abrignoni looks great.

Real results without usagestats on Linux:
Before
real 0m24.984s
user 0m9.424s
sys 0m0.541s

After
real 0m22.285s
user 0m9.364s
sys 0m0.469s

@abrignoni abrignoni merged commit f7541e8 into abrignoni:master Aug 16, 2022
@bconstanzo
Copy link
Contributor Author

Under Windows 10 Home 64 bit, I set up a virtual env with Python 3.10.3 and all the dependencies from the requirements.txt file.

Then it was just cloning ALEAPP on sunday night, and ran:

python aleapp.py -t tar -i path_to_image -o path_for_reports

I have a Ryzen 4600H, 24GB RAM and ran against a SATA HDD just for going with the worst case scenario, though I also tested against an nvme drive and it ran just as fast.

For profiling I'd run it as python -m cProfile -o profile_file.pstats aleap.py... or add the @Profile decorator and go with a kernprof -lv aleapp.py ....

The timings and speedups I commented about in the commits are based on what I saw during the profile runs, which under cProfile were running about half as fast as it'd run normally, and the reported time by the tool.

Right now I just benchmarked with a very simple script:

import time
import subprocess

t0 = time.perf_counter()
p = subprocess.run(
  r'python aleapp.py -t tar -i "D:\Test\xLEAPP\test_data\Magnet Acquire\Android 12 - Data.tar" -o "D:\Test\xLEAPP\output\aleapp-magnet"',
  shell=True
)
t1 = time.perf_counter()

print()
print()
print(f"Time taken: {t1 - t0}")

And it gave me 477 seconds for a clone of ALEAPP (which the tool reported as 6 minutes and 34 seconds) and 202 seconds for the patched version (which the tool reported as 2 minutes sharp). That is just shy of 2.4x faster, and there clearly is something off with the reported time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants