Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance benchmark #88

Merged
merged 6 commits into from
Feb 28, 2020
Merged

Add performance benchmark #88

merged 6 commits into from
Feb 28, 2020

Conversation

chrisjsewell
Copy link
Member

No description provided.

@chrisjsewell
Copy link
Member Author

Block quotes don't seem to be getting rendered correctly: https://181-240151150-gh.circle-artifacts.com/0/html/using/benchmark.html#blockquote. However, it seems to be parsing as expected:

> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
<document source="notset">
    <block_quote>
        <paragraph>
            This is the first level of quoting.
        <block_quote>
            <paragraph>
                This is nested blockquote.
        <paragraph>
            Back to the first level.

@chrisjsewell
Copy link
Member Author

@choldgraf the issue above looks to be another fix for the pandas_sphinx_theme, since when I remove it, I get the correct block quote indentations: https://183-240151150-gh.circle-artifacts.com/0/html/using/benchmark.html#blockquote

This reverts commit 5761fdb.
@choldgraf
Copy link
Member

good catch, opened up: pydata/pydata-sphinx-theme#103

btw, I'm curious how this maps on to, say, the amount of content that the QuantEcon book has. It seems like our Sphinx parser will take relatively more time, but what about the absolute time for an amount of content like what the QE lectures have?

@chrisjsewell
Copy link
Member Author

btw, I'm curious how this maps on to, say, the amount of content that the QuantEcon book has. It seems like our Sphinx parser will take relatively more time, but what about the absolute time for an amount of content like what the QE lectures have?

How would you envisage benchmarking this? As we saw before in your profiling, really the bottleneck will be in calling certain roles/directives that do a lot of processing (perhaps we could add a profiller for that, or upstream to Sphinx). The raw parsing speed would mainly be a factor if you are doing 'real-time' parsing (for linting, previews, etc); here you probably wouldn't actually call all the directives/roles (maybe just a small 'whitelist')

@chrisjsewell chrisjsewell merged commit cf3352c into develop Feb 28, 2020
@choldgraf
Copy link
Member

@chrisjsewell I don't care so much about benchmarking this, but about having a number that we can use to convince people that performance won't be an issue here. Your test doc was 1000 lines (which is probably longer than most), and you ran the iteration 1000 times.

E.g., does this mean that if this process took 70 seconds, then we have 73 / 1000 = .073 seconds per page? Or about 1 second processing per 10 pages? I'm just trying to tie these benchmarking numbers to people's expected subjective experience.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Feb 29, 2020

E.g., does this mean that if this process took 70 seconds, then we have 73 / 1000 = .073 seconds per page? Or about 1 second processing per 10 pages? I'm just trying to tie these benchmarking numbers to people's expected subjective experience.

Going back to my patented sphinx summary:

1. event.config-inited(app,config)
2. event.builder-inited(app)
3. event.env-get-outdated(app, env, added, changed, removed)
4. event.env-before-read-docs(app, env, docnames)

for docname in docnames:
    5.  event.env-purge-doc(app, env, docname)
    if doc changed and not removed:
        6. source-read(app, docname, source)
        7. run parser: text -> docutils.document
        8. apply transforms (by priority): docutils.document -> docutils.document
        9. event.doctree-read(app, doctree)

10. event.env-updated(app, env)
11. event.env-check-consistency(app, env)

for docname in docnames:
    12. apply post-transforms (by priority): docutils.document -> docutils.document
    13. event.doctree-resolved(app, doctree, docname)

14. call builder

15. event.build-finished(app, exception)

It means that stage (6) will take x/1000 seconds per page, with x being the value for the DocutilsRenderer (with no sphinx initiation), IF you are parsing a page with no roles/directives. But then, for a subjective experience, you need to factor in (a) what roles/directives you are using, and how much processing time these take, (b) all the other stages of the sphinx build.

You could write a function do run/measure this, given a certain set of source files (in the same way I run contained sphinx builds in tests/test_sphinx). But obviously you couldn't actually benchmark this to the other Markdown parsers, except recommonmark, because they don't have sphinx parsers.

@chrisjsewell chrisjsewell deleted the benchmark branch March 4, 2020 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants