Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the best way to get involved? #45

Open
johnnyman727 opened this issue Feb 19, 2016 · 9 comments
Open

What's the best way to get involved? #45

johnnyman727 opened this issue Feb 19, 2016 · 9 comments

Comments

@johnnyman727
Copy link

Hi leaf folks,

I'm wondering what's the best way for someone new to the project to get involved and start contributing? Are you looking for new contributors? Are there some low hanging fruits I can try and tackle? For context, I am somewhere between a beginner and intermediate with regards to both Rust and ML.

@hobofan
Copy link
Member

hobofan commented Feb 26, 2016

Sorry for the long wait, I totally forgot to reply here 😓 .

Generally for any project I would say the best way to get involved highly depends on what your personal reasons for the involvment of a project are and also what your strengths are (/ or weaknesses you want to work on).

A few things where we'd really appreciate new contributors getting involved:

@johnnyman727
Copy link
Author

Thanks @hobofan. I'm a novice at this point so I'll start with your first point. Once I have a solid understanding of how the existing examples work, I'll try to build some that are more interesting. If that goes well, I'll look into documentation.

As a meta-note, I think it would be really useful if you had a Github label like "beginner-friendly" for your issues so I could easily check in on what unassigned issues are ripe for the taking.

@DavidYKay
Copy link

@johnnyman727 thanks for starting this discussion! I'm in a very similar situation to you and curious to start helping out and digging in.

My background: I'm primarily a mobile app developer, focused on the medical industry. Outside of native apps, I have experience in Clojure and Rust. I'm also a contributor to the React Native documentation. My interest in Leaf is that I feel like I've been spending my career on throwaway projects that aren't maximizing the use of Moore's Law. I think that convolutional neural nets / deep learning are a much better way of giving back. And why use C when you can use Rust? :)

Of the assignments, I'm excited to start implementing new layers, but I'm not sure if I'm ready for that. Thus, I think it'd be most useful for me to work on the documentation, at least initially, as this will help my understanding of what each layer is doing, and then progress to working on layers once my understanding is greater.

@hobofan, does this sound like a reasonable approach? Let me know if you have any time to chat about this. I'd love your input. Would love to see if you think I'm ready to take on a simple layer project or if I should cut my teeth on docs.

Thanks!

@hobofan
Copy link
Member

hobofan commented Mar 7, 2016

@DavidYKay Yes, that sounds great!

I think any of the Layer issues should be approachable by a newcommer to the project (most of them have a similar Layer to draw inspiration from). Feel free to hit me up on our Gitter anytime. :)

@byronyi
Copy link

byronyi commented Mar 9, 2016

Hi @hobofan I am interested in implementing a distributed runtime for Leaf. Do we have any plan on multi-node implementations for the development team, currently?

@hobofan
Copy link
Member

hobofan commented Mar 9, 2016

@byronyi Nothing concrete yet, but we would like to handle it one abstraction-layer higher than Leaf, with only minimal changes required in Leaf itself. What did you have in mind for multi-node? Parameter server workers?

Since any kind of distribution will require sending serialized parts of the networks over the network that would probably depend on the serialization being implemented, which I am currently working on (#14/#15). See my next comment.

@byronyi
Copy link

byronyi commented Mar 9, 2016

Yes, parameter server should be a good candidate. I have just finished my work in a project in which we implemented parameter server on Hadoop using Java, so I know a little bit of the architecture.

I am not sure though, because the problem we solved was for regular machine learning algorithms (GLM, matrix factorization, or LDA) instead of deep learning, e.g. ConvNet, which should requires more frequent global synchronization. Classic message passing pattern like MPI with All Reduce might still be a reasonable choice.

What do you think?

@hobofan
Copy link
Member

hobofan commented Mar 9, 2016

The serialization will probably end up being based on capnproto so capnproto-rpc might be a natural choice. I personally have only little (unpleasant) experience with MPI so I might be biased against it, but from what I gather it is mainly used in scientific fields, and I am not sure it fits in that well with Leaf.

One of the main problems with DNN parameter servers is that the weight updates are usually quite huge and thus synchronization can already with a few nodes become quite slow. There are a few ways to reduce the load, like introducing a threshold for weight updates to become relevant for synchronization and transfer weight updates as f16 instead of f32. With that in mind I take back my previous statement with regards to serialization, since the data you want to send for weight updates is likely very different from the one you want to serialize.

@byronyi
Copy link

byronyi commented Mar 9, 2016

I don't really have much opinion on the serialization part; I had experience with protocol buffer (with home brewed RPC when gRPC was not released), and I think cap'nproto(-rpc) would just work fine.

I do agree that special care is needed when sending weight updates, and it would be better if we could make it flexible enough so people can experiment with different compression/filtering techniques, as this might slow down or even screw up model convergence.

Regarding to your thought on MPI, I am a little curious on the goal of Leaf. Maybe I am wrong, but I think other projects on your benchmark page (Caffe, Torch, TensorFlow) still have most their users working in a research area related to deep learning. Yahoo announced a hybrid project where they initialize Caffe inside Spark executor and synchronize the model using MPI style communication with RDMA. What specific aspect of such communication style you think might not fit well with Leaf?

It might make a big difference, if Leaf is not designed to share some of the fundamental characteristics of MPI, e.g. lack of fault tolerance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants