-
Alibaba DAMO Academy
- Hangzhou
-
09:16
(UTC 08:00) - https://ericzw.github.io/
Lists (2)
Sort Name ascending (A-Z)
Stars
[NeurIPS 2024] PointMamba: A Simple State Space Model for Point Cloud Analysis
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Official repo for CellPLM: Pre-training of Cell Language Model Beyond Single Cells.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
We present a comprehensive and deep review of the HFM in challenges, opportunities, and future directions. The released paper: https://arxiv.org/abs/2404.03264
Prov-GigaPath: A whole-slide foundation model for digital pathology from real-world data
[CVPR 2024] Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis
[Official Repo] Visual Mamba: A Survey and New Outlooks
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
Here is the official implementation of the paper "Cross-Modal Translation and Alignment for Survival Analysis"
Collection of papers on state-space models
A simple and efficient Mamba implementation in pure PyTorch and MLX.
ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).
PyTorch code and models for the DINOv2 self-supervised learning method.
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Build high-performance AI models with modular building blocks
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
[Survey] Masked Modeling for Self-supervised Representation Learning on Vision and Beyond (https://arxiv.org/abs/2401.00897)
Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.
This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models
Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024