Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 555 Bytes

README.md

File metadata and controls

6 lines (4 loc) · 555 Bytes

NeMo Vision Collection

The NeMo Vision Collection is designed to support the multimodal collection, particularly for models like LLAVA that necessitate a vision encoder implementation. At present, the vision collection features support for ViT, a customized version of the transformer model from Megatron core.

Our documentation offers comprehensive insights into each supported model, facilitating seamless integration and utilization within your projects.