-
Notifications
You must be signed in to change notification settings - Fork 27.5k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
84 Pull requests merged by 51 people
-
Update codeowners with individual model owners
#35595 merged
Jan 10, 2025 -
Skip
MobileNetV1ModelTest::test_batching_equivalence
for now#35614 merged
Jan 10, 2025 -
Fix flaky
test_beam_search_low_memory
#35611 merged
Jan 10, 2025 -
Let
EarlyStoppingCallback
not requireload_best_model_at_end
#35101 merged
Jan 10, 2025 -
Added error when sequence length is bigger than max_position_embeddings
#32156 merged
Jan 10, 2025 -
Use inherit tempdir makers for tests fix failing DS tests
#35600 merged
Jan 10, 2025 -
Fix flaky
test_custom_4d_attention_mask
#35606 merged
Jan 10, 2025 -
[WIP] Emu3: add model
#33770 merged
Jan 10, 2025 -
Fix flex_attention in training mode
#35605 merged
Jan 10, 2025 -
Chat template: return vectorized output in processors
#34275 merged
Jan 10, 2025 -
Add Moonshine
#34784 merged
Jan 10, 2025 -
Skip torchscript tests if we see a cache object produced in a test
#35596 merged
Jan 10, 2025 -
ModernBert: reuse GemmaRotaryEmbedding via modular Integration tests
#35459 merged
Jan 10, 2025 -
Add flex_attn to diffllama
#35601 merged
Jan 9, 2025 -
ModernBERT bug fixes
#35404 merged
Jan 9, 2025 -
add
_supports_flex_attn = True
for models that do support it#35598 merged
Jan 9, 2025 -
[doc] deepspeed universal checkpoint
#35015 merged
Jan 9, 2025 -
Refactor/fix Cohere2
#35863 merged
Jan 9, 2025 -
[
tokenizers
] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer#35593 merged
Jan 9, 2025 -
Fix modular edge case modular sorting order
#35562 merged
Jan 9, 2025 -
PR for Issue #22694: Fixed Training Evaluation table display for VSCode
#35557 merged
Jan 9, 2025 -
Small fix rope kwargs
#35589 merged
Jan 9, 2025 -
Fix flaky
SwitchTransformersModelTest::test_training_gradient
#35587 merged
Jan 9, 2025 -
tokenizer
train from iterator without pre_tokenizers#35396 merged
Jan 9, 2025 -
feat: add TP plan for granite
#35573 merged
Jan 9, 2025 -
[Idefics3] Move image features to same device as input embeds
#35100 merged
Jan 9, 2025 -
update modular_modernbert -- add inputs_embeds param to ModernBertModel
#35373 merged
Jan 9, 2025 -
Fix flaky
test_batching_equivalence
#35564 merged
Jan 9, 2025 -
Setup loss_type in config at model init time
#34616 merged
Jan 9, 2025 -
Re-add missing __all__ for Cohere and Phi3
#35578 merged
Jan 9, 2025 -
Minor fix in video text 2 text docs
#35546 merged
Jan 9, 2025 -
More model refactoring!
#35359 merged
Jan 9, 2025 -
Don't show warning for
inv_freq
buffers#35255 merged
Jan 9, 2025 -
Fix multi-gpu loss
#35395 merged
Jan 9, 2025 -
update code owners
#35576 merged
Jan 9, 2025 -
[i18n-ar] Translated file:
docs/source/ar/tasks/multiple_choice.md
into Arabic#35199 merged
Jan 8, 2025 -
Fix all output_dir in test_trainer.py to use tmp_dir
#35266 merged
Jan 8, 2025 -
Pipeline: simple API for assisted generation
#34504 merged
Jan 8, 2025 -
[
PixtralLarge
] Update Pixtral conversion script to support large format!#34801 merged
Jan 8, 2025 -
[docs] Remove Hiera from AUDIO MODELS in docs
#35544 merged
Jan 8, 2025 -
ovewrite top_k when crate audio classification pipeline
#35541 merged
Jan 8, 2025 -
add code owners
#35528 merged
Jan 8, 2025 -
Add ViTPose
#30530 merged
Jan 8, 2025 -
fix: Qwen2-VL generate with inputs_embeds
#35466 merged
Jan 8, 2025 -
Update doc for
metric_for_best_model
whensave_strategy="best"
.#35389 merged
Jan 8, 2025 -
Add: num_additional_image_tokens to models
#35052 merged
Jan 8, 2025 -
Enable auto task for timm models in pipeline
#35531 merged
Jan 8, 2025 -
Bump torch requirement to 2
#35479 merged
Jan 8, 2025 -
Timm wrapper label names
#35553 merged
Jan 8, 2025 -
Update missing model error message
#35370 merged
Jan 8, 2025 -
Update doc and default value of TextNetImageProcessor
#35563 merged
Jan 8, 2025 -
Add support for modular with fast image processors
#35379 merged
Jan 8, 2025 -
[Docs] links to
logits-processor-zoo
#35552 merged
Jan 8, 2025 -
Fix Qwen2VL processor to handle odd number of frames
#35431 merged
Jan 8, 2025 -
support chat generator as input of TextGenerationPipeline
#35551 merged
Jan 8, 2025 -
Pass correct
num_items_in_batch
value into the training_step function#35438 merged
Jan 8, 2025 -
MODERNBERT_INPUTS_DOCSTRING: past_key_values are ignored
#35513 merged
Jan 8, 2025 -
VLMs: major clean up 🧼
#34502 merged
Jan 8, 2025 -
Add TextNet
#34979 merged
Jan 8, 2025 -
[docs] Remove sortish_sampler
#35539 merged
Jan 7, 2025 -
Correctly list the chat template file in the Tokenizer saved files list
#34974 merged
Jan 7, 2025 -
[Whisper] fix docstrings typo
#35338 merged
Jan 7, 2025 -
[Qwen2Audio] handle input ids expansion during processing
#35534 merged
Jan 7, 2025 -
Release GPU memory after Optuna trial
#35440 merged
Jan 7, 2025 -
Check whether rescale is requested before checking is_scaled_image
#35439 merged
Jan 7, 2025 -
Fix bug when requesting input normalization with EnCodec
#34756 merged
Jan 7, 2025 -
Add diffllama
#34083 merged
Jan 7, 2025 -
NPU support SDPA
#35165 merged
Jan 7, 2025 -
Replace tokenizer to processing_class in Seq2SeqTrainer
#35452 merged
Jan 7, 2025 -
ci: mark model_parallel tests as cuda specific
#35269 merged
Jan 7, 2025 -
Zamba new attention standard
#35375 merged
Jan 7, 2025 -
[Dinov2 with Registers] Some fixes
#35411 merged
Jan 6, 2025 -
added logic for deleting adapters once loaded
#34650 merged
Jan 6, 2025 -
Fixed typo in Llama configuration docstring
#35520 merged
Jan 6, 2025 -
🌐 [i18n-KO] Remove duplicates in toctree
#35496 merged
Jan 6, 2025 -
[GGUF] Refactor and decouple gguf checkpoint loading logic
#34385 merged
Jan 6, 2025 -
Bump jinja2 from 3.1.4 to 3.1.5 in /examples/research_projects/decision_transformer
#35408 merged
Jan 6, 2025 -
Update llm_optims docs for
sdpa_kernel
#35481 merged
Jan 6, 2025 -
🌐 [i18n-KO] Translated
altclip.md
to Korean#34863 merged
Jan 6, 2025 -
Add check for if num_items_in_batch is not None
#35102 merged
Jan 6, 2025 -
Add
position_ids
inXLMRobertaXLForCausalLM.prepare_inputs_for_generation
#35044 merged
Jan 6, 2025 -
Add French translation of task_summary and tasks_explained
#33407 merged
Jan 6, 2025 -
Idefics: fix docstring
#35079 merged
Jan 6, 2025 -
Fix Llava conversion for models that use safetensors to store weights
#35406 merged
Jan 6, 2025
33 Pull requests opened by 27 people
-
Add support for 4D custom attention masks in GPT-2
#35517 opened
Jan 5, 2025 -
Remove batch size argument warning when unjustified
#35519 opened
Jan 6, 2025 -
Validate the num imgs and vids tokens
#35521 opened
Jan 6, 2025 -
Add proper jinja2 error
#35533 opened
Jan 6, 2025 -
change README FILE
#35535 opened
Jan 6, 2025 -
Security fix for `self-comment-ci.yml`
#35548 opened
Jan 7, 2025 -
Add AIMv2 to Transformers
#35550 opened
Jan 7, 2025 -
Add support for nested images to LLava and VipLLava
#35558 opened
Jan 7, 2025 -
BLIPs clean-up
#35560 opened
Jan 8, 2025 -
Support QuestionAnswering Module for ModernBert based models.
#35566 opened
Jan 8, 2025 -
Trainer Refactor: Part 1
#35567 opened
Jan 8, 2025 -
add qwen2.5vl
#35569 opened
Jan 8, 2025 -
Exploit symmetry of covariance matrices for faster & more stable diag…
#35571 opened
Jan 8, 2025 -
Multimodal Granite Support
#35579 opened
Jan 9, 2025 -
Get latest complete checkpoint directory when auto-resume from checkpoint
#35580 opened
Jan 9, 2025 -
apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True
#35582 opened
Jan 9, 2025 -
A new Traditional Chinese version of the README.md for run_on_remote.py
#35585 opened
Jan 9, 2025 -
[generation] Support cache-cropping methods
#35591 opened
Jan 9, 2025 -
Fix the config class comparison for remote code models
#35592 opened
Jan 9, 2025 -
[fix] cannot import name 'Pop2PianoFeatureExtractor' from 'transformers'
#35604 opened
Jan 10, 2025 -
[tests] make cuda-only tests device-agnostic
#35607 opened
Jan 10, 2025 -
Fix device in rope module when using dynamic updates
#35608 opened
Jan 10, 2025 -
Still more model refactors!
#35610 opened
Jan 10, 2025 -
Uniformize LlavaNextVideoProcessor kwargs
#35613 opened
Jan 10, 2025 -
🚨🚨🚨 An attempt to fix #29554. Include 'LayerNorm.' in gamma/beta rename scope, optimize string search.
#35615 opened
Jan 10, 2025 -
Process inputs directly in apply_chat_template in image-text-to-text pipeline
#35616 opened
Jan 10, 2025 -
xpu: fix benchmarking scripts for xpu devices
#35620 opened
Jan 11, 2025 -
Fix Batch Size Mismatch When Using `crops_n_layers` in `mask-generation` Pipeline #35530
#35627 opened
Jan 11, 2025 -
Guard against unset resolved_archive_file
#35628 opened
Jan 11, 2025 -
[docs] add return_timestamps=True for Whisper long-form transcription
#35633 opened
Jan 12, 2025 -
Removed some duplicated code
#35637 opened
Jan 12, 2025 -
[ViTPose] Convert more checkpoints
#35638 opened
Jan 12, 2025 -
modular_model_converter bugfix on assignments
#35642 opened
Jan 12, 2025
58 Issues closed by 19 people
-
Add keypoint-detection task
#24044 closed
Jan 12, 2025 -
tokenizer.json modified after tokenizer.save_pretrained of OLMO models
#34744 closed
Jan 12, 2025 -
Bug of eval loss when enabling average_tokens_across_devices
#35078 closed
Jan 12, 2025 -
Documentation for SWAG contradicts itself when constructing the first sentence.
#35095 closed
Jan 12, 2025 -
`dataloader_persistent_workers=True` causes fork-bomb due to repeated creation of `eval_dataloader`
#28469 closed
Jan 11, 2025 -
bus error on version 4.43.0 with pretrained community CLIP model - MacOS
#33357 closed
Jan 11, 2025 -
long-standing Bug in Adafactor optimizer if beta1 > 0
#34506 closed
Jan 11, 2025 -
VLlama3ForCausalLM in SmolVLM
#35039 closed
Jan 11, 2025 -
ImportError: cannot import name 'HfApiEngine' from 'transformers'
#35051 closed
Jan 11, 2025 -
Get "NotImplementedError: Cannot copy out of meta tensor; no data!" error while deploying model
#35057 closed
Jan 11, 2025 -
issues when i change the lm_head to a 32 node layer
#35071 closed
Jan 11, 2025 -
Multiple training runs not working with deepspeed
#35073 closed
Jan 11, 2025 -
OverflowError: out of range integral type conversion attempted
#35540 closed
Jan 10, 2025 -
`T5ForSequenceClassification`
#14097 closed
Jan 10, 2025 -
Incorrect average calculation in `Perplexity of fixed-length models`
#34138 closed
Jan 10, 2025 -
`Nan` logits when performing inference using ModernBERT
#35574 closed
Jan 10, 2025 -
PaliGemma2 Processor returns wrong labels array when <image> token is present in `text`
#35200 closed
Jan 10, 2025 -
Issues counting passing rates on tests which use subTest()
#34755 closed
Jan 10, 2025 -
FA2 broken for Cohere2 if Optional `Mask` is not passed in `forward`
#35547 closed
Jan 9, 2025 -
Training Evaluation Display on VSCode
#22694 closed
Jan 9, 2025 -
`train_new_from_iterator()` does not work when pre_tokenizer is null
#35315 closed
Jan 9, 2025 -
torch.compile DataCollatorWithFlattening flash_attention_2.7 causes crash when training
#35590 closed
Jan 9, 2025 -
<spam>
#35577 closed
Jan 9, 2025 -
qwen2 rope device matching bug
#35505 closed
Jan 9, 2025 -
is_causal arg appears twice in FAttention call from GPT2Attention.forward()
#35380 closed
Jan 9, 2025 -
VisualBert: Why isn't the pooler_output used to calculate the logits in VisualBertForQuestionAnswering?
#35025 closed
Jan 9, 2025 -
LlamaTokenizer being recognized as a bool
#35037 closed
Jan 9, 2025 -
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
#24915 closed
Jan 8, 2025 -
iframe
#35559 closed
Jan 8, 2025 -
Qwen2-VL used to work with `inputs_embeds` instead of `input_ids`, but no more
#35463 closed
Jan 8, 2025 -
ModernBERT does not have `inputs_embeds` input
#35555 closed
Jan 8, 2025 -
Transformers cannot load ModernBERT for sequence classification
#35362 closed
Jan 8, 2025 -
Any plans for a Test-Time Compute Scaling Package or Module?
#35561 closed
Jan 8, 2025 -
Qwen2VLProcessor cannot handle odd number of video frames
#35412 closed
Jan 8, 2025 -
LLaVa 1.5 and 1.6 not working with text-only inputs
#35424 closed
Jan 8, 2025 -
LlavaForConditionalGeneration._merge_input_ids_with_image_features throws error
#35169 closed
Jan 8, 2025 -
Assistant decoding w. Llava-Next does not work
#35450 closed
Jan 8, 2025 -
Flash attention 2 broke when batch inference
#34824 closed
Jan 8, 2025 -
Data collator class type integrity is not intact throughout the runtime
#34830 closed
Jan 8, 2025 -
Data prefetching does not occur for iterable datasets
#34867 closed
Jan 8, 2025 -
Deepspeed integration: support batch sizes that are less than the number of gpus/ranks
#27299 closed
Jan 8, 2025 -
Remove sortish_sampler from Seq2SeqTrainingArgument and the docs
#34986 closed
Jan 7, 2025 -
Add support for Allegro
#34347 closed
Jan 7, 2025 -
How to run the model on another machine and send the answer to another machine.
#35485 closed
Jan 7, 2025 -
xpu: parallelize() not supported for PyTorch XPU backend
#35252 closed
Jan 7, 2025 -
Inference with FSDP during training affects checkpoints
#34530 closed
Jan 7, 2025 -
SAMProcessor padding for rectangular aspect input images isn't symmetric
#35017 closed
Jan 7, 2025 -
Unable to export GLM models to ONNX
#35021 closed
Jan 7, 2025 -
Enabling Access to Currently Training Model in Callback Handler
#35542 closed
Jan 7, 2025 -
tokenizers.apply_chat_template with `continue_final_message=True` with trailing spaces in input
#35433 closed
Jan 6, 2025 -
Bug of self.accelerator.gather(num_items_in_batch) with enabling average_tokens_across_devices
#35076 closed
Jan 6, 2025 -
IdeficsImageProcessor raises unexpected ValueError
#35391 closed
Jan 6, 2025 -
Redundant Operations.
#34958 closed
Jan 6, 2025 -
Memory Leak When Using padding="max_length" in T5 Text Encoder On CPU
#34988 closed
Jan 6, 2025
45 Issues opened by 42 people
-
tokenizer.decode() and tokenizer.convert_ids_to_tokens() return different results
#35641 opened
Jan 12, 2025 -
Breaking change in v4.48.0 and Python 3.9
#35639 opened
Jan 12, 2025 -
FSDP OOM error
#35636 opened
Jan 12, 2025 -
set_initialized_submodules too slow when loading big model like DeepSeekV3
#35635 opened
Jan 12, 2025 -
ValueError: MllamaForConditionalGeneration does not support Flash Attention 2.0 yet
#35634 opened
Jan 12, 2025 -
Trying To Convert Paligemma model in npz to hf model format
#35632 opened
Jan 12, 2025 -
[i18n-<languageCode>] Translating docs to <languageName>
#35630 opened
Jan 11, 2025 -
Dose `num_logits_to_keep` in `model.generate()` really work?
#35629 opened
Jan 11, 2025 -
static cache with mixtral will cause CUDA error: device-side assert triggered
#35626 opened
Jan 11, 2025 -
The Phi model does not have lm_head bias after upgraded to v4.48.0
#35625 opened
Jan 11, 2025 -
Segmentation fault: address not mapped to object at address 0x100000007
#35624 opened
Jan 11, 2025 -
Unsupported: hasattr SkipFunctionVariable when i compile the mixtral model with muti-gpus
#35623 opened
Jan 11, 2025 -
The argument "dim" is gone from LlamaRotaryEmbedding initializer. Intentional?
#35621 opened
Jan 11, 2025 -
from_pretrained fails to save weights.py and layers.py into cache, therefore fails to find them in cache
#35619 opened
Jan 11, 2025 -
Help Understanding Beam Search Scores in Hugging Face (LLaMA LoRA)
#35618 opened
Jan 10, 2025 -
Better handeling of hardcoded component in PretrainedModel.from_pretrained.
#35617 opened
Jan 10, 2025 -
Trainer: TensorBoardCallback not working for "on_save" and "on_save_end" events
#35612 opened
Jan 10, 2025 -
Trainer sets `state.best_model_checkpoint` even when it doesn't save there; leads to training crash
#35609 opened
Jan 10, 2025 -
Prompt_ids feature causing repetitions and hallucinations
#35603 opened
Jan 10, 2025 -
weird criterion to decide if needed to adjust the padding size
#35599 opened
Jan 9, 2025 -
Inconsistent saving of tokenizer with custom code from HF hub vs. local directory
#35597 opened
Jan 9, 2025 -
flash_attention_2 2.7.2.post1 seems to crash when using `torch.compile` and `DataCollatorWithFlattening`
#35588 opened
Jan 9, 2025 -
Malformed config when saving & loading locally custom models
#35584 opened
Jan 9, 2025 -
Tokenizer outputs same offsets for different tokens.
#35575 opened
Jan 9, 2025 -
Error occurs when using model.generate with Gemma2 in ZeRO3 environment
#35572 opened
Jan 9, 2025 -
Transformers can create unconventional python module names when loading certain repositories
#35570 opened
Jan 8, 2025 -
Any plans to integrate GTE model natively into transformers
#35568 opened
Jan 8, 2025 -
Add cosmos from Nvidia
#35565 opened
Jan 8, 2025 -
4.47.1 Hugging Face Trainer loss accumulated by sum instead of mean
#35556 opened
Jan 7, 2025 -
ModernBERT export to onnx error
#35545 opened
Jan 7, 2025 -
AttributeError: 'Config' object has no attribute '_get_non_default_generation_parameters'
#35543 opened
Jan 7, 2025 -
is possible convert transforms tokenizers in sentence piece .model?
#35538 opened
Jan 6, 2025 -
Mask2FormerImageProcessor support overlapping features
#35536 opened
Jan 6, 2025 -
RagTokenizer Missing patch_token_id, patch_token, and encode Functionality
#35532 opened
Jan 6, 2025 -
SAM mask-generation - crops_n_layers
#35530 opened
Jan 6, 2025 -
Trainer: update `state.num_input_tokens_seen` to use `num_items_in_batch`
#35529 opened
Jan 6, 2025 -
Trainer: Use second last checkpoint if last checkpoint loading fails
#35525 opened
Jan 6, 2025 -
Warning 'The attention mask is not set'
#35524 opened
Jan 6, 2025 -
How about adding a combined step and epoch feature to save_strategy?
#35523 opened
Jan 6, 2025 -
Very slow to load deep seekv3 int4 model and device_map="auto" "sequential" bug
#35522 opened
Jan 6, 2025 -
Batch size deprecation warning issued even when it is not used
#35518 opened
Jan 6, 2025
136 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Prompt Depth Anything Model
#35401 commented on
Jan 9, 2025 • 77 new comments -
Samhq model addition
#35147 commented on
Jan 10, 2025 • 48 new comments -
Add Relation DETR
#34900 commented on
Jan 10, 2025 • 32 new comments -
Enhanced Installation Section in README.md
#35094 commented on
Jan 11, 2025 • 26 new comments -
Add TimesFM Time Series Forecasting Model
#34082 commented on
Jan 10, 2025 • 25 new comments -
Implement SuperGlue model
#29886 commented on
Jan 8, 2025 • 24 new comments -
Add GOT-OCR 2.0 to Transformers
#34721 commented on
Jan 7, 2025 • 24 new comments -
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on
Jan 10, 2025 • 20 new comments -
Support constant lr with cooldown
#35453 commented on
Jan 11, 2025 • 17 new comments -
Add common test for `torch.export` and fix some vision models
#35124 commented on
Jan 10, 2025 • 10 new comments -
Enable gptqmodel
#35012 commented on
Jan 10, 2025 • 7 new comments -
Use AMD CI workflow defined in hf-workflows
#35058 commented on
Jan 9, 2025 • 7 new comments -
Add autoquant support for torchao quantizer
#35503 commented on
Jan 8, 2025 • 7 new comments -
Improve support for image generation with Chameleon & Anole
#32013 commented on
Jan 7, 2025 • 6 new comments -
docs: Clarify descriptions for mask_labels in Mask2Former
#35514 commented on
Jan 11, 2025 • 5 new comments -
Grounding DINO Processor standardization
#34853 commented on
Jan 7, 2025 • 4 new comments -
support telechat2
#35415 commented on
Jan 12, 2025 • 4 new comments -
[WIP] Possible bug - adding option to save/reload scaler
#34932 commented on
Jan 8, 2025 • 3 new comments -
ModernBERT FlexAttention
#35423 commented on
Jan 12, 2025 • 3 new comments -
Universal Speculative Decoding `CandidateGenerator`
#35029 commented on
Jan 12, 2025 • 2 new comments -
Add dithering to the `Speech2TextFeatureExtractor` API.
#34638 commented on
Jan 9, 2025 • 2 new comments -
OmDet Turbo processor standardization
#34937 commented on
Jan 7, 2025 • 2 new comments -
uniformize kwargs for OneFormer
#34547 commented on
Jan 6, 2025 • 2 new comments -
enable StaticCache for assisted generation
#34797 commented on
Jan 7, 2025 • 2 new comments -
Fix Qwen2RotaryEmbedding Device Matching Bug
#35506 commented on
Jan 7, 2025 • 1 new comment -
Efficient Inference Kernel for SpQR
#34976 commented on
Jan 9, 2025 • 1 new comment -
Bart: new cache format
#35314 commented on
Jan 10, 2025 • 1 new comment -
OwlViT/Owlv2 post processing standardization
#34929 commented on
Jan 7, 2025 • 1 new comment -
Integrate xlstm cleanly.
#35377 commented on
Jan 11, 2025 • 1 new comment -
Pass callbacks kwarg to study.optimize() in run_hp_search_optuna()
#34732 commented on
Jan 9, 2025 • 0 new comments -
Add helper for torch dynamo
#35478 commented on
Jan 9, 2025 • 0 new comments -
fix: Handle BLIP-2 model output format
#34705 commented on
Jan 6, 2025 • 0 new comments -
Update config validation
#34726 commented on
Jan 8, 2025 • 0 new comments -
Clean-up composite configs
#34603 commented on
Jan 6, 2025 • 0 new comments -
Add support for Apple's Depth-Pro
#34583 commented on
Jan 7, 2025 • 0 new comments -
Add Zamba2
#34517 commented on
Jan 7, 2025 • 0 new comments -
Added `segmentation maps` support for DPT image processor
#34345 commented on
Jan 10, 2025 • 0 new comments -
LLaVA-NeXT: add new model checkpoints
#34195 commented on
Jan 6, 2025 • 0 new comments -
Added resource class configuration option for `check_circleci_user` job
#32866 commented on
Jan 12, 2025 • 0 new comments -
fix: Updated BridgeTower Image processor
#32384 commented on
Jan 12, 2025 • 0 new comments -
[GroundingDino] Fix grounding dino loss 🚨
#31828 commented on
Jan 10, 2025 • 0 new comments -
[docs] Redesign
#31757 commented on
Jan 11, 2025 • 0 new comments -
Add LightGlue model
#31718 commented on
Jan 8, 2025 • 0 new comments -
Support Kosmos-2.5
#31711 commented on
Jan 10, 2025 • 0 new comments -
[WIP] Add implementation of `_extract_fbank_features_batch`
#31579 commented on
Jan 7, 2025 • 0 new comments -
Option to Disable Model Caching When Using "pipeline"
#35337 commented on
Jan 6, 2025 • 0 new comments -
Add FAST
#35476 commented on
Jan 11, 2025 • 0 new comments -
Fix #35447 Tokenizer does not split text according to newly added input tokens
#35455 commented on
Jan 8, 2025 • 0 new comments -
T5 static cache
#35445 commented on
Jan 7, 2025 • 0 new comments -
Add D-FINE Model into Transformers
#35400 commented on
Jan 6, 2025 • 0 new comments -
Add support for H2O cache eviction with LLaMA
#35381 commented on
Jan 6, 2025 • 0 new comments -
Several fixes related to rotary position embeddings
#35376 commented on
Jan 11, 2025 • 0 new comments -
Add support for post-processing kwargs in image-text-to-text pipeline
#35374 commented on
Jan 10, 2025 • 0 new comments -
FineTuning AutoModelForSequenceClassification.from_pretrained(meta-llama/Llama-3.2-1B) Bug:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward) and awq importing
#35365 commented on
Jan 7, 2025 • 0 new comments -
Add JinaBERT model
#35320 commented on
Jan 10, 2025 • 0 new comments -
Add support for DeepSpeed sequence parallelism (Ulysses)
#35301 commented on
Jan 7, 2025 • 0 new comments -
speed up PrefixConstrainedLogitsProcessor
#35275 commented on
Jan 9, 2025 • 0 new comments -
Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities
#35251 commented on
Jan 9, 2025 • 0 new comments -
Fix Gemma2 synced multi-GPU generation
#35232 commented on
Jan 7, 2025 • 0 new comments -
[i18n-ar] Translated file : docs/source/ar/tasks/token_classification.md into Arabic
#35193 commented on
Jan 12, 2025 • 0 new comments -
Pixtral: vectorize patch embeddings and enable tests
#35122 commented on
Jan 10, 2025 • 0 new comments -
Output dicts support in text generation pipeline
#35092 commented on
Jan 10, 2025 • 0 new comments -
Refactoring of ImageProcessorFast
#35069 commented on
Jan 6, 2025 • 0 new comments -
switch from `training_args.bin` `training_args.json`
#35010 commented on
Jan 7, 2025 • 0 new comments -
Fix max size deprecated warning
#34998 commented on
Jan 12, 2025 • 0 new comments -
Remove _supports_static_cache = True for some model classes
#34975 commented on
Jan 9, 2025 • 0 new comments -
Enable different torch dtype in sub models
#34873 commented on
Jan 9, 2025 • 0 new comments -
add a new flax example for Bert model inference
#34794 commented on
Jan 10, 2025 • 0 new comments -
Adding RTDETRv2
#34773 commented on
Jan 11, 2025 • 0 new comments -
Request for a Vision Transformer Model for Digital Image Segmentation
#35477 commented on
Jan 7, 2025 • 0 new comments -
Instructions to raise PR for addition of shared library files(.so) and .cpp files
#35492 commented on
Jan 7, 2025 • 0 new comments -
RuntimeError: shape '[1, 3098, 6, 5, 128]' is invalid for input of size 12689408
#35146 commented on
Jan 7, 2025 • 0 new comments -
PeftModel is not an instance of PreTrainedModel. `No liger kernels will be applied.`
#34016 commented on
Jan 7, 2025 • 0 new comments -
Stop requiring CacheConfig in GenerationConfig with StaticCache
#35026 commented on
Jan 7, 2025 • 0 new comments -
`transformers.image_transforms.resize` doesnot work for negative values
#34920 commented on
Jan 7, 2025 • 0 new comments -
Deprecation Warning for `max_size` in `DetrImageProcessor.preprocess`
#34977 commented on
Jan 7, 2025 • 0 new comments -
Deepseek v2
#35317 commented on
Jan 7, 2025 • 0 new comments -
Add AudioQuestionAnswering pipeline
#33782 commented on
Jan 7, 2025 • 0 new comments -
Any plans to add AIMv2 in the model?
#35351 commented on
Jan 7, 2025 • 0 new comments -
Impossible to change attention implementation
#35153 commented on
Jan 8, 2025 • 0 new comments -
Qwen2vl float16 inference bug in naive attention
#35151 commented on
Jan 8, 2025 • 0 new comments -
Cuda OOM
#35150 commented on
Jan 8, 2025 • 0 new comments -
Discrepancy in Training Loss Behavior with Gradient Accumulation using DeepSpeed
#34694 commented on
Jan 8, 2025 • 0 new comments -
Tokenizer does not split text according to newly added input tokens
#35447 commented on
Jan 8, 2025 • 0 new comments -
Multi-GPU training crashes with IterableDataset and different length input (e.g. Next token prediction)
#35308 commented on
Jan 8, 2025 • 0 new comments -
Custom 4D tensor caused shape mismatch error
#35290 commented on
Jan 5, 2025 • 0 new comments -
Special token ids are not longer typed properly in 4.47.0
#35126 commented on
Jan 6, 2025 • 0 new comments -
The dot in the model name when using auto_map will cause a path parsing error.
#35082 commented on
Jan 6, 2025 • 0 new comments -
CPU processing is extremely slow for models loaded with `torch_dtype = torch.float16`
#34692 commented on
Jan 6, 2025 • 0 new comments -
FlaxWhisperForConditionalGeneration Out Of Memory Error
#34668 commented on
Jan 6, 2025 • 0 new comments -
SinkCache (StreamLLM) implemented over Post-RoPE Key cache might result in confused position for inference
#35350 commented on
Jan 6, 2025 • 0 new comments -
'do_sample' model default cannot be overridden
#35372 commented on
Jan 6, 2025 • 0 new comments -
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on
Jan 6, 2025 • 0 new comments -
Set output_attentions=True for model.geneate
#35393 commented on
Jan 6, 2025 • 0 new comments -
Allow static cache to be larger than sequence length / batch size for encoder-decoder models
#35444 commented on
Jan 6, 2025 • 0 new comments -
Training config that worked with transformers v4.4.6.3 results in OOM error with v4.47.0 (using SFTTrainer)
#35108 commented on
Jan 6, 2025 • 0 new comments -
DeepSeek V3 Support
#35425 commented on
Jan 6, 2025 • 0 new comments -
`trainer.evaluate` always creates a new MLFlow run, separate from the one used during `train()`
#35074 commented on
Jan 6, 2025 • 0 new comments -
HfArgumentParser error when using LoraConfig dataclass
#34834 commented on
Jan 6, 2025 • 0 new comments -
FileNotFoundError when using SentenceTransformerTrainingArguments(load_best_model_at_end=True) and Peft
#34747 commented on
Jan 6, 2025 • 0 new comments -
`GPT2Attention()` class with `_attn()` method when `add_cross_attention=True` and therefore `is_cross_attention=True`.
#35430 commented on
Jan 6, 2025 • 0 new comments -
Training issues latest version
#35407 commented on
Jan 6, 2025 • 0 new comments -
Log multiple losses used along with the combined losses when a model returns a dictionary of losses.
#31081 commented on
Jan 10, 2025 • 0 new comments -
Uniform kwargs for processors
#31911 commented on
Jan 10, 2025 • 0 new comments -
A warning message showing that `MultiScaleDeformableAttention.so` is not found in `/root/.cache/torch_extensions` if `ninja` is installed with `transformers`
#35349 commented on
Jan 10, 2025 • 0 new comments -
run_mlm_flax on tpu v5-pods
#35205 commented on
Jan 11, 2025 • 0 new comments -
logged loss is not correct with gradient accumulation
#35204 commented on
Jan 11, 2025 • 0 new comments -
gradient calculation is not correct with gradient accumulation in LM training
#35203 commented on
Jan 11, 2025 • 0 new comments -
Accelerate x Trainer issue tracker:
#33345 commented on
Jan 11, 2025 • 0 new comments -
Initializing via AutoImageProcessor before AutoProcessor is imported causes `AttributeError`
#34307 commented on
Jan 11, 2025 • 0 new comments -
StopStringCriteria relies on `len(tokenizer)==model.config.vocab_size`, leading to index errors
#35244 commented on
Jan 12, 2025 • 0 new comments -
Offline mode doesn't work with models that require `trust_remote_code=True`
#34855 commented on
Jan 12, 2025 • 0 new comments -
Trying to train a model using automatic1111. Error - Exception training model: 'module 'transformers.integrations' has no attribute 'deepspeed''.
#34427 commented on
Jan 12, 2025 • 0 new comments -
[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float'
#33552 commented on
Jan 12, 2025 • 0 new comments -
Wav2Vec2BertForSequenceClassification. return_attention_mask work wrong
#35495 commented on
Jan 12, 2025 • 0 new comments -
How can I disable legacy processing in llava-next
#35457 commented on
Jan 12, 2025 • 0 new comments -
apply class transformers.SequenceBiasLogitsProcessor on Qwen model
#35432 commented on
Jan 12, 2025 • 0 new comments -
Beit image classification have different results compared from versions prior to 4.43.0
#34446 commented on
Jan 12, 2025 • 0 new comments -
Potential fix for #30819. Check for best metric only on gpu 0
#31268 commented on
Jan 6, 2025 • 0 new comments -
Mimi model gives different outputs when using batch encode vs single encode
#35166 commented on
Jan 9, 2025 • 0 new comments -
[tests] run one test but got 2 test results
#35159 commented on
Jan 9, 2025 • 0 new comments -
When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite
#35075 commented on
Jan 9, 2025 • 0 new comments -
Plain-DETR
#27496 commented on
Jan 9, 2025 • 0 new comments -
Calling Trainer.create_model_card() with an empty dataset list causes an IndexError
#35163 commented on
Jan 9, 2025 • 0 new comments -
modernbert logits do not have gradient
#35386 commented on
Jan 9, 2025 • 0 new comments -
tracker: `generate` compatibility with `torch.compile`
#28981 commented on
Jan 9, 2025 • 0 new comments -
VLMs Processors are not fully consistent in the inputs formats they accept
#34545 commented on
Jan 9, 2025 • 0 new comments -
Unclear what happens when using torchrun, multi-gpu and trainer arguments.
#35311 commented on
Jan 9, 2025 • 0 new comments -
DynamicCache does not support variable lengths, except for FA2
#35168 commented on
Jan 9, 2025 • 0 new comments -
Export to ExecuTorch
#32253 commented on
Jan 9, 2025 • 0 new comments -
Can't load models with a gamma or beta parameter
#29554 commented on
Jan 10, 2025 • 0 new comments -
Qwen2vl support for GGUF
#35282 commented on
Jan 10, 2025 • 0 new comments -
How to convert my Mask2Former model (ResNet-50 backbone) to Hugging Face transformer
#35186 commented on
Jan 10, 2025 • 0 new comments -
QuantizedCache first token processing is counterintuitive / worse than in papers
#35185 commented on
Jan 10, 2025 • 0 new comments -
Saving model with shared tensors fails on cpu but succeeds on gpu
#33688 commented on
Jan 10, 2025 • 0 new comments -
Unknown quantization type, got fp8
#35471 commented on
Jan 10, 2025 • 0 new comments