Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Hashset to GetAllReferencedProjects to only load project once #2409

Merged
merged 13 commits into from
Jun 11, 2017

Conversation

gerardtoconnor
Copy link
Contributor

This PR is to prevent the same project paths being loaded again if already loaded using a hashset of the project paths to check against before loading.

@forki
Copy link
Member

forki commented Jun 10, 2017

I think we also need to cache multiple calls to it

need brackets on not function
@gerardtoconnor
Copy link
Contributor Author

I'm not sure how much you want to optimize this but, as well as caching, I believe we be able to run the fetch in parallel if we use Async XML node readers in the path loads ... I've only started looking at Paket today so not sure how much you want me to investigate performance tweaks?

@forki
Copy link
Member

forki commented Jun 10, 2017

I think it's good for now. We will release it and ask for feedback. Thank you

@gerardtoconnor
Copy link
Contributor Author

I added a DependencyCache field to ProjectFile, not ideal but given the dependencies can change between pack runs, and ProjectFile obj only created once, caching to obj is better than IO/Persisted etc that will load stale data on subsequent runs.

@matthid
Copy link
Member

matthid commented Jun 10, 2017

To be honest I'd prefer the first solution (sorry @gerardtoconnor). Reason is that I don't think that additional field yields a lot of benefit. Of course I'd be easily convinced by numbers :)

@matthid
Copy link
Member

matthid commented Jun 10, 2017

Either way lets see what the CI says

@gerardtoconnor
Copy link
Contributor Author

@matthid I ironically agree also, the only other option is to cache the result appropriately in each method where the dependencies are called, as high up as possible such that all child functions use the result, not the function, may require a change in some function signatures to include dependencies ... I'll look into it now.

@gerardtoconnor
Copy link
Contributor Author

I have now updated with a cache path dictionary that is created on the Pack start and propagates through to all callers for GetAllReferences()

@gerardtoconnor
Copy link
Contributor Author

I have added caching to dependency reference also, to reduce the memory footprint, I used hash of reference paths rather than large lists and dictionaries of string. Would be great if we could test this on the big project that was having the issue before merged.

@matthid
Copy link
Member

matthid commented Jun 10, 2017

Travis says there are some problems. To be completly honest with you I'd rather go with the most simple implementation you started with and maybe use the new "tracing" capabilities of paket5 to see if performance is actually still a problem afterwards.

Note that I don't have problems with the breaking changes in particular (we already have a lot for paket5) but it might be a problem if methods get an additional parameter where it is not clear (on first sight) why they are needed.

Sorry for the back and forth and clarifying so late. It's only my personal opinion, @forki might have another one and I'm completely fine with merging this - once it's green ;)

@forki
Copy link
Member

forki commented Jun 11, 2017

thanks!

@forki forki merged commit 8c8ca42 into fsprojects:master Jun 11, 2017
@gerardtoconnor
Copy link
Contributor Author

@matthid I fully understand on changing function signatures, at the end of the day, if I wanted to deploy a centralised cache, I needed to ensure only one was created (start of pack method) and propagated down to all calls of GetAllReferences, I appreciate my solution is not pretty or Ideal but on the bright side, the ugliness has come at a significant performance boost, @konste tested and it brings pack time down from an 'Eternatiy' to 11 seconds. There is more performance tweaking I can do to improve further but as @forki said, we're probably good for now, 11 secs tolerable. If there is any commenting/ docs you want me to update to reflect this cache being passed around, let me know (it's a memoise more than a cache really).

@matthid
Copy link
Member

matthid commented Jun 11, 2017

If there is any commenting/ docs you want me to update to reflect this cache being passed around, let me know (it's a memoise more than a cache really).

@gerardtoconnor No everything is good, thanks for taking care of this :)

@gerardtoconnor
Copy link
Contributor Author

@forki @matthid apologies for my ignorance, but if I want to run the integration tests (on my local windows 10 machine) is there a way? there were just a few things I wanted to tidy and simplify in my caching algo and if I could run tests locally would save annoying Travis etc?

For big edge case projects that have hundred(s) references, is it worth creating an integration test for big projects that can ensure we are not introducing some recursion bugs?

@matthid
Copy link
Member

matthid commented Jun 11, 2017

Usually I open the solution in VS build all and run tests from the test runner (including integration tests) or just build.cmd in cmd

note that you might need the nunit test runner for this to work in vs (build.cmd) should work always

@forki
Copy link
Member

forki commented Jun 11, 2017

also just calling build.cmd RunIntegrationTests from cmd line should do it.

@gerardtoconnor
Copy link
Contributor Author

Much appreciated gentleman, build cmd working ( I forgot .cmd treating it like exe ) Helpful to know on VS, I had been working in vscode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants