pyiwa

Read iWork's internal iwa format using python.

pyiwa was written as a recovery tool for ugly disk crashes or accidental deletions. It is not a .pages file parser.

Actually, there's no such thing as a iWork '09 pages file. They are directories containing another directory and a zip file which contains the actual, useful payload.

This payload is constituted of iwa files. They were reverse-engineered by @obriensp (see: https://github.com/obriensp/iWorkFileFormat).

pyiwa reads iwa files extracted from iWork zips.

First it extracts the hacked-upon-snappy compressed stream that an iwa file actually is.

Then it uses introspection to parse the various contatenated fields and stores them in a list of Protocol buffers (actually an array of them). The reverse engineering is not complete so some fields are just not understood and skipped. There is no way to match them with a known protocol buffer so we ignore them.

Once all the fields have been listed, another loop searches for fields containing a text item and prints the text items.

Therefore pyiwa could be considered, as it is, as an iwacat.

However there's more to it as one could use it to parse the formatting, fonts, etc. used to make it an iwa2*, though I doubt it's worth it.

It's also a good programming exercise in python with binary streams.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
resources		resources
LICENSE		LICENSE
README.md		README.md
pyiwa.py		pyiwa.py
varints.py		varints.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyiwa

About

Releases

Packages

Languages

License

matchaxnb/pyiwa

Folders and files

Latest commit

History

Repository files navigation

pyiwa

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages