Pulse · huggingface/transformers · GitHub

January 5, 2025 – January 12, 2025

Overview

117 Active pull requests

103 Active issues

1 Release published by 1 person

v4.48.0 v4.48.0: ModernBERT, Aria, TimmWrapper, ColPali, Falcon3, Bamba, VitPose, DinoV2 w/ Registers, Emu3, Cohere v2, TextNet, DiffLlama, PixtralLarge, Moonshine
published Jan 10, 2025

84 Pull requests merged by 51 people

Update codeowners with individual model owners
#35595 merged Jan 10, 2025
Skip MobileNetV1ModelTest::test_batching_equivalence for now
#35614 merged Jan 10, 2025
Fix flaky test_beam_search_low_memory
#35611 merged Jan 10, 2025
Let EarlyStoppingCallback not require load_best_model_at_end
#35101 merged Jan 10, 2025
Added error when sequence length is bigger than max_position_embeddings
#32156 merged Jan 10, 2025
Use inherit tempdir makers for tests fix failing DS tests
#35600 merged Jan 10, 2025
Fix flaky test_custom_4d_attention_mask
#35606 merged Jan 10, 2025
[WIP] Emu3: add model
#33770 merged Jan 10, 2025
Fix flex_attention in training mode
#35605 merged Jan 10, 2025
Chat template: return vectorized output in processors
#34275 merged Jan 10, 2025
Add Moonshine
#34784 merged Jan 10, 2025
Skip torchscript tests if we see a cache object produced in a test
#35596 merged Jan 10, 2025
ModernBert: reuse GemmaRotaryEmbedding via modular Integration tests
#35459 merged Jan 10, 2025
Add flex_attn to diffllama
#35601 merged Jan 9, 2025
ModernBERT bug fixes
#35404 merged Jan 9, 2025
add _supports_flex_attn = True for models that do support it
#35598 merged Jan 9, 2025
[doc] deepspeed universal checkpoint
#35015 merged Jan 9, 2025
Refactor/fix Cohere2
#35863 merged Jan 9, 2025
[tokenizers] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer
#35593 merged Jan 9, 2025
Fix modular edge case modular sorting order
#35562 merged Jan 9, 2025
PR for Issue #22694: Fixed Training Evaluation table display for VSCode
#35557 merged Jan 9, 2025
Small fix rope kwargs
#35589 merged Jan 9, 2025
Fix flaky SwitchTransformersModelTest::test_training_gradient
#35587 merged Jan 9, 2025
tokenizer train from iterator without pre_tokenizers
#35396 merged Jan 9, 2025
feat: add TP plan for granite
#35573 merged Jan 9, 2025
[Idefics3] Move image features to same device as input embeds
#35100 merged Jan 9, 2025
update modular_modernbert -- add inputs_embeds param to ModernBertModel
#35373 merged Jan 9, 2025
Fix flaky test_batching_equivalence
#35564 merged Jan 9, 2025
Setup loss_type in config at model init time
#34616 merged Jan 9, 2025
Re-add missing __all__ for Cohere and Phi3
#35578 merged Jan 9, 2025
Minor fix in video text 2 text docs
#35546 merged Jan 9, 2025
More model refactoring!
#35359 merged Jan 9, 2025
Don't show warning for inv_freq buffers
#35255 merged Jan 9, 2025
Fix multi-gpu loss
#35395 merged Jan 9, 2025
update code owners
#35576 merged Jan 9, 2025
[i18n-ar] Translated file: docs/source/ar/tasks/multiple_choice.md into Arabic
#35199 merged Jan 8, 2025
Fix all output_dir in test_trainer.py to use tmp_dir
#35266 merged Jan 8, 2025
Pipeline: simple API for assisted generation
#34504 merged Jan 8, 2025
[PixtralLarge] Update Pixtral conversion script to support large format!
#34801 merged Jan 8, 2025
[docs] Remove Hiera from AUDIO MODELS in docs
#35544 merged Jan 8, 2025
ovewrite top_k when crate audio classification pipeline
#35541 merged Jan 8, 2025
add code owners
#35528 merged Jan 8, 2025
Add ViTPose
#30530 merged Jan 8, 2025
fix: Qwen2-VL generate with inputs_embeds
#35466 merged Jan 8, 2025
Update doc for metric_for_best_model when save_strategy="best".
#35389 merged Jan 8, 2025
Add: num_additional_image_tokens to models
#35052 merged Jan 8, 2025
Enable auto task for timm models in pipeline
#35531 merged Jan 8, 2025
Bump torch requirement to 2
#35479 merged Jan 8, 2025
Timm wrapper label names
#35553 merged Jan 8, 2025
Update missing model error message
#35370 merged Jan 8, 2025
Update doc and default value of TextNetImageProcessor
#35563 merged Jan 8, 2025
Add support for modular with fast image processors
#35379 merged Jan 8, 2025
[Docs] links to logits-processor-zoo
#35552 merged Jan 8, 2025
Fix Qwen2VL processor to handle odd number of frames
#35431 merged Jan 8, 2025
support chat generator as input of TextGenerationPipeline
#35551 merged Jan 8, 2025
Pass correct num_items_in_batch value into the training_step function
#35438 merged Jan 8, 2025
MODERNBERT_INPUTS_DOCSTRING: past_key_values are ignored
#35513 merged Jan 8, 2025
VLMs: major clean up 🧼
#34502 merged Jan 8, 2025
Add TextNet
#34979 merged Jan 8, 2025
[docs] Remove sortish_sampler
#35539 merged Jan 7, 2025
Correctly list the chat template file in the Tokenizer saved files list
#34974 merged Jan 7, 2025
[Whisper] fix docstrings typo
#35338 merged Jan 7, 2025
[Qwen2Audio] handle input ids expansion during processing
#35534 merged Jan 7, 2025
Release GPU memory after Optuna trial
#35440 merged Jan 7, 2025
Check whether rescale is requested before checking is_scaled_image
#35439 merged Jan 7, 2025
Fix bug when requesting input normalization with EnCodec
#34756 merged Jan 7, 2025
Add diffllama
#34083 merged Jan 7, 2025
NPU support SDPA
#35165 merged Jan 7, 2025
Replace tokenizer to processing_class in Seq2SeqTrainer
#35452 merged Jan 7, 2025
ci: mark model_parallel tests as cuda specific
#35269 merged Jan 7, 2025
Zamba new attention standard
#35375 merged Jan 7, 2025
[Dinov2 with Registers] Some fixes
#35411 merged Jan 6, 2025
added logic for deleting adapters once loaded
#34650 merged Jan 6, 2025
Fixed typo in Llama configuration docstring
#35520 merged Jan 6, 2025
🌐 [i18n-KO] Remove duplicates in toctree
#35496 merged Jan 6, 2025
[GGUF] Refactor and decouple gguf checkpoint loading logic
#34385 merged Jan 6, 2025
Bump jinja2 from 3.1.4 to 3.1.5 in /examples/research_projects/decision_transformer
#35408 merged Jan 6, 2025
Update llm_optims docs for sdpa_kernel
#35481 merged Jan 6, 2025
🌐 [i18n-KO] Translated altclip.md to Korean
#34863 merged Jan 6, 2025
Add check for if num_items_in_batch is not None
#35102 merged Jan 6, 2025
Add position_ids in XLMRobertaXLForCausalLM.prepare_inputs_for_generation
#35044 merged Jan 6, 2025
Add French translation of task_summary and tasks_explained
#33407 merged Jan 6, 2025
Idefics: fix docstring
#35079 merged Jan 6, 2025
Fix Llava conversion for models that use safetensors to store weights
#35406 merged Jan 6, 2025

33 Pull requests opened by 27 people

Add support for 4D custom attention masks in GPT-2
#35517 opened Jan 5, 2025
Remove batch size argument warning when unjustified
#35519 opened Jan 6, 2025
Validate the num imgs and vids tokens
#35521 opened Jan 6, 2025
Add proper jinja2 error
#35533 opened Jan 6, 2025
change README FILE
#35535 opened Jan 6, 2025
Security fix for `self-comment-ci.yml`
#35548 opened Jan 7, 2025
Add AIMv2 to Transformers
#35550 opened Jan 7, 2025
Add support for nested images to LLava and VipLLava
#35558 opened Jan 7, 2025
BLIPs clean-up
#35560 opened Jan 8, 2025
Support QuestionAnswering Module for ModernBert based models.
#35566 opened Jan 8, 2025
Trainer Refactor: Part 1
#35567 opened Jan 8, 2025
add qwen2.5vl
#35569 opened Jan 8, 2025
Exploit symmetry of covariance matrices for faster & more stable diag…
#35571 opened Jan 8, 2025
Multimodal Granite Support
#35579 opened Jan 9, 2025
Get latest complete checkpoint directory when auto-resume from checkpoint
#35580 opened Jan 9, 2025
apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True
#35582 opened Jan 9, 2025
A new Traditional Chinese version of the README.md for run_on_remote.py
#35585 opened Jan 9, 2025
[generation] Support cache-cropping methods
#35591 opened Jan 9, 2025
Fix the config class comparison for remote code models
#35592 opened Jan 9, 2025
[fix] cannot import name 'Pop2PianoFeatureExtractor' from 'transformers'
#35604 opened Jan 10, 2025
[tests] make cuda-only tests device-agnostic
#35607 opened Jan 10, 2025
Fix device in rope module when using dynamic updates
#35608 opened Jan 10, 2025
Still more model refactors!
#35610 opened Jan 10, 2025
Uniformize LlavaNextVideoProcessor kwargs
#35613 opened Jan 10, 2025
🚨🚨🚨 An attempt to fix #29554. Include 'LayerNorm.' in gamma/beta rename scope, optimize string search.
#35615 opened Jan 10, 2025
Process inputs directly in apply_chat_template in image-text-to-text pipeline
#35616 opened Jan 10, 2025
xpu: fix benchmarking scripts for xpu devices
#35620 opened Jan 11, 2025
Fix Batch Size Mismatch When Using `crops_n_layers` in `mask-generation` Pipeline #35530
#35627 opened Jan 11, 2025
Guard against unset resolved_archive_file
#35628 opened Jan 11, 2025
[docs] add return_timestamps=True for Whisper long-form transcription
#35633 opened Jan 12, 2025
Removed some duplicated code
#35637 opened Jan 12, 2025
[ViTPose] Convert more checkpoints
#35638 opened Jan 12, 2025
modular_model_converter bugfix on assignments
#35642 opened Jan 12, 2025

58 Issues closed by 19 people

Add keypoint-detection task
#24044 closed Jan 12, 2025
tokenizer.json modified after tokenizer.save_pretrained of OLMO models
#34744 closed Jan 12, 2025
Bug of eval loss when enabling average_tokens_across_devices
#35078 closed Jan 12, 2025
size mismatch for lm_head.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([59744, 768]).
#35084 closed Jan 12, 2025
Documentation for SWAG contradicts itself when constructing the first sentence.
#35095 closed Jan 12, 2025
`dataloader_persistent_workers=True` causes fork-bomb due to repeated creation of `eval_dataloader`
#28469 closed Jan 11, 2025
bus error on version 4.43.0 with pretrained community CLIP model - MacOS
#33357 closed Jan 11, 2025
long-standing Bug in Adafactor optimizer if beta1 > 0
#34506 closed Jan 11, 2025
VLlama3ForCausalLM in SmolVLM
#35039 closed Jan 11, 2025
ImportError: cannot import name 'HfApiEngine' from 'transformers'
#35051 closed Jan 11, 2025
Get "NotImplementedError: Cannot copy out of meta tensor; no data!" error while deploying model
#35057 closed Jan 11, 2025
issues when i change the lm_head to a 32 node layer
#35071 closed Jan 11, 2025
Multiple training runs not working with deepspeed
#35073 closed Jan 11, 2025
OverflowError: out of range integral type conversion attempted
#35540 closed Jan 10, 2025
`T5ForSequenceClassification`
#14097 closed Jan 10, 2025
Incorrect average calculation in `Perplexity of fixed-length models`
#34138 closed Jan 10, 2025
`Nan` logits when performing inference using ModernBERT
#35574 closed Jan 10, 2025
PaliGemma2 Processor returns wrong labels array when <image> token is present in `text`
#35200 closed Jan 10, 2025
Issues counting passing rates on tests which use subTest()
#34755 closed Jan 10, 2025
FA2 broken for Cohere2 if Optional `Mask` is not passed in `forward`
#35547 closed Jan 9, 2025
Training Evaluation Display on VSCode
#22694 closed Jan 9, 2025
`train_new_from_iterator()` does not work when pre_tokenizer is null
#35315 closed Jan 9, 2025
torch.compile DataCollatorWithFlattening flash_attention_2.7 causes crash when training
#35590 closed Jan 9, 2025
<spam>
#35577 closed Jan 9, 2025
qwen2 rope device matching bug
#35505 closed Jan 9, 2025
is_causal arg appears twice in FAttention call from GPT2Attention.forward()
#35380 closed Jan 9, 2025
VisualBert: Why isn't the pooler_output used to calculate the logits in VisualBertForQuestionAnswering?
#35025 closed Jan 9, 2025
[Idefics 3] Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1
#35031 closed Jan 9, 2025
LlamaTokenizer being recognized as a bool
#35037 closed Jan 9, 2025
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
#24915 closed Jan 8, 2025
iframe
#35559 closed Jan 8, 2025
Qwen2-VL used to work with `inputs_embeds` instead of `input_ids`, but no more
#35463 closed Jan 8, 2025
ModernBERT does not have `inputs_embeds` input
#35555 closed Jan 8, 2025
Transformers cannot load ModernBERT for sequence classification
#35362 closed Jan 8, 2025
Any plans for a Test-Time Compute Scaling Package or Module?
#35561 closed Jan 8, 2025
Qwen2VLProcessor cannot handle odd number of video frames
#35412 closed Jan 8, 2025
LLaVa 1.5 and 1.6 not working with text-only inputs
#35424 closed Jan 8, 2025
LlavaForConditionalGeneration._merge_input_ids_with_image_features throws error
#35169 closed Jan 8, 2025
Assistant decoding w. Llava-Next does not work
#35450 closed Jan 8, 2025
Flash attention 2 broke when batch inference
#34824 closed Jan 8, 2025
Data collator class type integrity is not intact throughout the runtime
#34830 closed Jan 8, 2025
Data prefetching does not occur for iterable datasets
#34867 closed Jan 8, 2025
Deepspeed integration: support batch sizes that are less than the number of gpus/ranks
#27299 closed Jan 8, 2025
Remove sortish_sampler from Seq2SeqTrainingArgument and the docs
#34986 closed Jan 7, 2025
Potentially incorrect calculation of `total_updates` on >=4.46.0 since #34198 affecting multi gpu training
#35387 closed Jan 7, 2025
Add support for Allegro
#34347 closed Jan 7, 2025
How to run the model on another machine and send the answer to another machine.
#35485 closed Jan 7, 2025
xpu: parallelize() not supported for PyTorch XPU backend
#35252 closed Jan 7, 2025
Inference with FSDP during training affects checkpoints
#34530 closed Jan 7, 2025
SAMProcessor padding for rectangular aspect input images isn't symmetric
#35017 closed Jan 7, 2025
Unable to export GLM models to ONNX
#35021 closed Jan 7, 2025
Enabling Access to Currently Training Model in Callback Handler
#35542 closed Jan 7, 2025
tokenizers.apply_chat_template with `continue_final_message=True` with trailing spaces in input
#35433 closed Jan 6, 2025
Bug of self.accelerator.gather(num_items_in_batch) with enabling average_tokens_across_devices
#35076 closed Jan 6, 2025
IdeficsImageProcessor raises unexpected ValueError
#35391 closed Jan 6, 2025
Redundant Operations.
#34958 closed Jan 6, 2025
Memory Leak When Using padding="max_length" in T5 Text Encoder On CPU
#34988 closed Jan 6, 2025
OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option.
#35002 closed Jan 6, 2025

45 Issues opened by 42 people

tokenizer.decode() and tokenizer.convert_ids_to_tokens() return different results
#35641 opened Jan 12, 2025
Expected `tensors` and `new_tensors` to have the same type but found <class ‘tuple’> and <class ‘torch.Tensor’>
#35640 opened Jan 12, 2025
Breaking change in v4.48.0 and Python 3.9
#35639 opened Jan 12, 2025
FSDP OOM error
#35636 opened Jan 12, 2025
set_initialized_submodules too slow when loading big model like DeepSeekV3
#35635 opened Jan 12, 2025
ValueError: MllamaForConditionalGeneration does not support Flash Attention 2.0 yet
#35634 opened Jan 12, 2025
Trying To Convert Paligemma model in npz to hf model format
#35632 opened Jan 12, 2025
[i18n-<languageCode>] Translating docs to <languageName>
#35630 opened Jan 11, 2025
Dose `num_logits_to_keep` in `model.generate()` really work?
#35629 opened Jan 11, 2025
static cache with mixtral will cause CUDA error: device-side assert triggered
#35626 opened Jan 11, 2025
The Phi model does not have lm_head bias after upgraded to v4.48.0
#35625 opened Jan 11, 2025
Segmentation fault: address not mapped to object at address 0x100000007
#35624 opened Jan 11, 2025
Unsupported: hasattr SkipFunctionVariable when i compile the mixtral model with muti-gpus
#35623 opened Jan 11, 2025
running utills.fx.symbolic_trace on gp2 raised an error: torch.fx.proxy.TraceError: Proxy object cannot be iterated, which does not occur in the previous version
#35622 opened Jan 11, 2025
The argument "dim" is gone from LlamaRotaryEmbedding initializer. Intentional?
#35621 opened Jan 11, 2025
from_pretrained fails to save weights.py and layers.py into cache, therefore fails to find them in cache
#35619 opened Jan 11, 2025
Help Understanding Beam Search Scores in Hugging Face (LLaMA LoRA)
#35618 opened Jan 10, 2025
Better handeling of hardcoded component in PretrainedModel.from_pretrained.
#35617 opened Jan 10, 2025
Trainer: TensorBoardCallback not working for "on_save" and "on_save_end" events
#35612 opened Jan 10, 2025
Trainer sets `state.best_model_checkpoint` even when it doesn't save there; leads to training crash
#35609 opened Jan 10, 2025
Prompt_ids feature causing repetitions and hallucinations
#35603 opened Jan 10, 2025
LlavaNextVideoProcessor -> TypeError: LlavaNextVideoProcessor.__call__() got an unexpected keyword argument 'legacy' (I have the fix)
#35602 opened Jan 10, 2025
weird criterion to decide if needed to adjust the padding size
#35599 opened Jan 9, 2025
Inconsistent saving of tokenizer with custom code from HF hub vs. local directory
#35597 opened Jan 9, 2025
flash_attention_2 2.7.2.post1 seems to crash when using `torch.compile` and `DataCollatorWithFlattening`
#35588 opened Jan 9, 2025
Malformed config when saving & loading locally custom models
#35584 opened Jan 9, 2025
Unused kwargs: ['bnb_8bit_quant_type', 'bnb_8bit_use_double_quant', 'bnb_8bit_compute_dtype'] when using bnb quantization? These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
#35581 opened Jan 9, 2025
Tokenizer outputs same offsets for different tokens.
#35575 opened Jan 9, 2025
Error occurs when using model.generate with Gemma2 in ZeRO3 environment
#35572 opened Jan 9, 2025
Transformers can create unconventional python module names when loading certain repositories
#35570 opened Jan 8, 2025
Any plans to integrate GTE model natively into transformers
#35568 opened Jan 8, 2025
Add cosmos from Nvidia
#35565 opened Jan 8, 2025
4.47.1 Hugging Face Trainer loss accumulated by sum instead of mean
#35556 opened Jan 7, 2025
ModernBERT export to onnx error
#35545 opened Jan 7, 2025
AttributeError: 'Config' object has no attribute '_get_non_default_generation_parameters'
#35543 opened Jan 7, 2025
is possible convert transforms tokenizers in sentence piece .model?
#35538 opened Jan 6, 2025
Mask2FormerImageProcessor support overlapping features
#35536 opened Jan 6, 2025
RagTokenizer Missing patch_token_id, patch_token, and encode Functionality
#35532 opened Jan 6, 2025
SAM mask-generation - crops_n_layers
#35530 opened Jan 6, 2025
Trainer: update `state.num_input_tokens_seen` to use `num_items_in_batch`
#35529 opened Jan 6, 2025
Trainer: Use second last checkpoint if last checkpoint loading fails
#35525 opened Jan 6, 2025
Warning 'The attention mask is not set'
#35524 opened Jan 6, 2025
How about adding a combined step and epoch feature to save_strategy?
#35523 opened Jan 6, 2025
Very slow to load deep seekv3 int4 model and device_map="auto" "sequential" bug
#35522 opened Jan 6, 2025
Batch size deprecation warning issued even when it is not used
#35518 opened Jan 6, 2025

136 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Prompt Depth Anything Model
#35401 commented on Jan 9, 2025 • 77 new comments
Samhq model addition
#35147 commented on Jan 10, 2025 • 48 new comments
Add Relation DETR
#34900 commented on Jan 10, 2025 • 32 new comments
Enhanced Installation Section in README.md
#35094 commented on Jan 11, 2025 • 26 new comments
Add TimesFM Time Series Forecasting Model
#34082 commented on Jan 10, 2025 • 25 new comments
Implement SuperGlue model
#29886 commented on Jan 8, 2025 • 24 new comments
Add GOT-OCR 2.0 to Transformers
#34721 commented on Jan 7, 2025 • 24 new comments
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on Jan 10, 2025 • 20 new comments
Support constant lr with cooldown
#35453 commented on Jan 11, 2025 • 17 new comments
Add common test for `torch.export` and fix some vision models
#35124 commented on Jan 10, 2025 • 10 new comments
Enable gptqmodel
#35012 commented on Jan 10, 2025 • 7 new comments
Use AMD CI workflow defined in hf-workflows
#35058 commented on Jan 9, 2025 • 7 new comments
Add autoquant support for torchao quantizer
#35503 commented on Jan 8, 2025 • 7 new comments
Improve support for image generation with Chameleon & Anole
#32013 commented on Jan 7, 2025 • 6 new comments
docs: Clarify descriptions for mask_labels in Mask2Former
#35514 commented on Jan 11, 2025 • 5 new comments
Grounding DINO Processor standardization
#34853 commented on Jan 7, 2025 • 4 new comments
support telechat2
#35415 commented on Jan 12, 2025 • 4 new comments
[WIP] Possible bug - adding option to save/reload scaler
#34932 commented on Jan 8, 2025 • 3 new comments
ModernBERT FlexAttention
#35423 commented on Jan 12, 2025 • 3 new comments
Universal Speculative Decoding `CandidateGenerator`
#35029 commented on Jan 12, 2025 • 2 new comments
Add dithering to the `Speech2TextFeatureExtractor` API.
#34638 commented on Jan 9, 2025 • 2 new comments
OmDet Turbo processor standardization
#34937 commented on Jan 7, 2025 • 2 new comments
uniformize kwargs for OneFormer
#34547 commented on Jan 6, 2025 • 2 new comments
enable StaticCache for assisted generation
#34797 commented on Jan 7, 2025 • 2 new comments
Fix Qwen2RotaryEmbedding Device Matching Bug
#35506 commented on Jan 7, 2025 • 1 new comment
Efficient Inference Kernel for SpQR
#34976 commented on Jan 9, 2025 • 1 new comment
Bart: new cache format
#35314 commented on Jan 10, 2025 • 1 new comment
OwlViT/Owlv2 post processing standardization
#34929 commented on Jan 7, 2025 • 1 new comment
Integrate xlstm cleanly.
#35377 commented on Jan 11, 2025 • 1 new comment
Pass callbacks kwarg to study.optimize() in run_hp_search_optuna()
#34732 commented on Jan 9, 2025 • 0 new comments
Add helper for torch dynamo
#35478 commented on Jan 9, 2025 • 0 new comments
fix: Handle BLIP-2 model output format
#34705 commented on Jan 6, 2025 • 0 new comments
Update config validation
#34726 commented on Jan 8, 2025 • 0 new comments
Clean-up composite configs
#34603 commented on Jan 6, 2025 • 0 new comments
Add support for Apple's Depth-Pro
#34583 commented on Jan 7, 2025 • 0 new comments
Add Zamba2
#34517 commented on Jan 7, 2025 • 0 new comments
Added `segmentation maps` support for DPT image processor
#34345 commented on Jan 10, 2025 • 0 new comments
LLaVA-NeXT: add new model checkpoints
#34195 commented on Jan 6, 2025 • 0 new comments
Added resource class configuration option for `check_circleci_user` job
#32866 commented on Jan 12, 2025 • 0 new comments
fix: Updated BridgeTower Image processor
#32384 commented on Jan 12, 2025 • 0 new comments
[GroundingDino] Fix grounding dino loss 🚨
#31828 commented on Jan 10, 2025 • 0 new comments
[docs] Redesign
#31757 commented on Jan 11, 2025 • 0 new comments
Add LightGlue model
#31718 commented on Jan 8, 2025 • 0 new comments
Support Kosmos-2.5
#31711 commented on Jan 10, 2025 • 0 new comments
[WIP] Add implementation of `_extract_fbank_features_batch`
#31579 commented on Jan 7, 2025 • 0 new comments
Option to Disable Model Caching When Using "pipeline"
#35337 commented on Jan 6, 2025 • 0 new comments
Add FAST
#35476 commented on Jan 11, 2025 • 0 new comments
Fix #35447 Tokenizer does not split text according to newly added input tokens
#35455 commented on Jan 8, 2025 • 0 new comments
T5 static cache
#35445 commented on Jan 7, 2025 • 0 new comments
Add D-FINE Model into Transformers
#35400 commented on Jan 6, 2025 • 0 new comments
Add support for H2O cache eviction with LLaMA
#35381 commented on Jan 6, 2025 • 0 new comments
Several fixes related to rotary position embeddings
#35376 commented on Jan 11, 2025 • 0 new comments
Add support for post-processing kwargs in image-text-to-text pipeline
#35374 commented on Jan 10, 2025 • 0 new comments
FineTuning AutoModelForSequenceClassification.from_pretrained(meta-llama/Llama-3.2-1B) Bug:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward) and awq importing
#35365 commented on Jan 7, 2025 • 0 new comments
Add JinaBERT model
#35320 commented on Jan 10, 2025 • 0 new comments
Add support for DeepSpeed sequence parallelism (Ulysses)
#35301 commented on Jan 7, 2025 • 0 new comments
speed up PrefixConstrainedLogitsProcessor
#35275 commented on Jan 9, 2025 • 0 new comments
Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities
#35251 commented on Jan 9, 2025 • 0 new comments
Fix Gemma2 synced multi-GPU generation
#35232 commented on Jan 7, 2025 • 0 new comments
[i18n-ar] Translated file : docs/source/ar/tasks/token_classification.md into Arabic
#35193 commented on Jan 12, 2025 • 0 new comments
Pixtral: vectorize patch embeddings and enable tests
#35122 commented on Jan 10, 2025 • 0 new comments
Output dicts support in text generation pipeline
#35092 commented on Jan 10, 2025 • 0 new comments
Refactoring of ImageProcessorFast
#35069 commented on Jan 6, 2025 • 0 new comments
switch from `training_args.bin` `training_args.json`
#35010 commented on Jan 7, 2025 • 0 new comments
Fix max size deprecated warning
#34998 commented on Jan 12, 2025 • 0 new comments
Remove _supports_static_cache = True for some model classes
#34975 commented on Jan 9, 2025 • 0 new comments
Enable different torch dtype in sub models
#34873 commented on Jan 9, 2025 • 0 new comments
add a new flax example for Bert model inference
#34794 commented on Jan 10, 2025 • 0 new comments
Adding RTDETRv2
#34773 commented on Jan 11, 2025 • 0 new comments
Request for a Vision Transformer Model for Digital Image Segmentation
#35477 commented on Jan 7, 2025 • 0 new comments
Instructions to raise PR for addition of shared library files(.so) and .cpp files
#35492 commented on Jan 7, 2025 • 0 new comments
RuntimeError: shape '[1, 3098, 6, 5, 128]' is invalid for input of size 12689408
#35146 commented on Jan 7, 2025 • 0 new comments
PeftModel is not an instance of PreTrainedModel. `No liger kernels will be applied.`
#34016 commented on Jan 7, 2025 • 0 new comments
Stop requiring CacheConfig in GenerationConfig with StaticCache
#35026 commented on Jan 7, 2025 • 0 new comments
`transformers.image_transforms.resize` doesnot work for negative values
#34920 commented on Jan 7, 2025 • 0 new comments
Deprecation Warning for `max_size` in `DetrImageProcessor.preprocess`
#34977 commented on Jan 7, 2025 • 0 new comments
Deepseek v2
#35317 commented on Jan 7, 2025 • 0 new comments
Add AudioQuestionAnswering pipeline
#33782 commented on Jan 7, 2025 • 0 new comments
Any plans to add AIMv2 in the model?
#35351 commented on Jan 7, 2025 • 0 new comments
Impossible to change attention implementation
#35153 commented on Jan 8, 2025 • 0 new comments
Qwen2vl float16 inference bug in naive attention
#35151 commented on Jan 8, 2025 • 0 new comments
Cuda OOM
#35150 commented on Jan 8, 2025 • 0 new comments
Discrepancy in Training Loss Behavior with Gradient Accumulation using DeepSpeed
#34694 commented on Jan 8, 2025 • 0 new comments
Tokenizer does not split text according to newly added input tokens
#35447 commented on Jan 8, 2025 • 0 new comments
Multi-GPU training crashes with IterableDataset and different length input (e.g. Next token prediction)
#35308 commented on Jan 8, 2025 • 0 new comments
Custom 4D tensor caused shape mismatch error
#35290 commented on Jan 5, 2025 • 0 new comments
Special token ids are not longer typed properly in 4.47.0
#35126 commented on Jan 6, 2025 • 0 new comments
The dot in the model name when using auto_map will cause a path parsing error.
#35082 commented on Jan 6, 2025 • 0 new comments
CPU processing is extremely slow for models loaded with `torch_dtype = torch.float16`
#34692 commented on Jan 6, 2025 • 0 new comments
FlaxWhisperForConditionalGeneration Out Of Memory Error
#34668 commented on Jan 6, 2025 • 0 new comments
SinkCache (StreamLLM) implemented over Post-RoPE Key cache might result in confused position for inference
#35350 commented on Jan 6, 2025 • 0 new comments
'do_sample' model default cannot be overridden
#35372 commented on Jan 6, 2025 • 0 new comments
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on Jan 6, 2025 • 0 new comments
Set output_attentions=True for model.geneate
#35393 commented on Jan 6, 2025 • 0 new comments
Allow static cache to be larger than sequence length / batch size for encoder-decoder models
#35444 commented on Jan 6, 2025 • 0 new comments
Training config that worked with transformers v4.4.6.3 results in OOM error with v4.47.0 (using SFTTrainer)
#35108 commented on Jan 6, 2025 • 0 new comments
DeepSeek V3 Support
#35425 commented on Jan 6, 2025 • 0 new comments
`trainer.evaluate` always creates a new MLFlow run, separate from the one used during `train()`
#35074 commented on Jan 6, 2025 • 0 new comments
HfArgumentParser error when using LoraConfig dataclass
#34834 commented on Jan 6, 2025 • 0 new comments
FileNotFoundError when using SentenceTransformerTrainingArguments(load_best_model_at_end=True) and Peft
#34747 commented on Jan 6, 2025 • 0 new comments
`GPT2Attention()` class with `_attn()` method when `add_cross_attention=True` and therefore `is_cross_attention=True`.
#35430 commented on Jan 6, 2025 • 0 new comments
Training issues latest version
#35407 commented on Jan 6, 2025 • 0 new comments
Log multiple losses used along with the combined losses when a model returns a dictionary of losses.
#31081 commented on Jan 10, 2025 • 0 new comments
Uniform kwargs for processors
#31911 commented on Jan 10, 2025 • 0 new comments
A warning message showing that `MultiScaleDeformableAttention.so` is not found in `/root/.cache/torch_extensions` if `ninja` is installed with `transformers`
#35349 commented on Jan 10, 2025 • 0 new comments
run_mlm_flax on tpu v5-pods
#35205 commented on Jan 11, 2025 • 0 new comments
logged loss is not correct with gradient accumulation
#35204 commented on Jan 11, 2025 • 0 new comments
gradient calculation is not correct with gradient accumulation in LM training
#35203 commented on Jan 11, 2025 • 0 new comments
Accelerate x Trainer issue tracker:
#33345 commented on Jan 11, 2025 • 0 new comments
Initializing via AutoImageProcessor before AutoProcessor is imported causes `AttributeError`
#34307 commented on Jan 11, 2025 • 0 new comments
StopStringCriteria relies on `len(tokenizer)==model.config.vocab_size`, leading to index errors
#35244 commented on Jan 12, 2025 • 0 new comments
Offline mode doesn't work with models that require `trust_remote_code=True`
#34855 commented on Jan 12, 2025 • 0 new comments
Trying to train a model using automatic1111. Error - Exception training model: 'module 'transformers.integrations' has no attribute 'deepspeed''.
#34427 commented on Jan 12, 2025 • 0 new comments
[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float'
#33552 commented on Jan 12, 2025 • 0 new comments
Wav2Vec2BertForSequenceClassification. return_attention_mask work wrong
#35495 commented on Jan 12, 2025 • 0 new comments
How can I disable legacy processing in llava-next
#35457 commented on Jan 12, 2025 • 0 new comments
apply class transformers.SequenceBiasLogitsProcessor on Qwen model
#35432 commented on Jan 12, 2025 • 0 new comments
Beit image classification have different results compared from versions prior to 4.43.0
#34446 commented on Jan 12, 2025 • 0 new comments
Potential fix for #30819. Check for best metric only on gpu 0
#31268 commented on Jan 6, 2025 • 0 new comments
Mimi model gives different outputs when using batch encode vs single encode
#35166 commented on Jan 9, 2025 • 0 new comments
[tests] run one test but got 2 test results
#35159 commented on Jan 9, 2025 • 0 new comments
When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite
#35075 commented on Jan 9, 2025 • 0 new comments
Plain-DETR
#27496 commented on Jan 9, 2025 • 0 new comments
Calling Trainer.create_model_card() with an empty dataset list causes an IndexError
#35163 commented on Jan 9, 2025 • 0 new comments
modernbert logits do not have gradient
#35386 commented on Jan 9, 2025 • 0 new comments
tracker: `generate` compatibility with `torch.compile`
#28981 commented on Jan 9, 2025 • 0 new comments
VLMs Processors are not fully consistent in the inputs formats they accept
#34545 commented on Jan 9, 2025 • 0 new comments
Unclear what happens when using torchrun, multi-gpu and trainer arguments.
#35311 commented on Jan 9, 2025 • 0 new comments
DynamicCache does not support variable lengths, except for FA2
#35168 commented on Jan 9, 2025 • 0 new comments
Export to ExecuTorch
#32253 commented on Jan 9, 2025 • 0 new comments
Can't load models with a gamma or beta parameter
#29554 commented on Jan 10, 2025 • 0 new comments
Qwen2vl support for GGUF
#35282 commented on Jan 10, 2025 • 0 new comments
How to convert my Mask2Former model (ResNet-50 backbone) to Hugging Face transformer
#35186 commented on Jan 10, 2025 • 0 new comments
QuantizedCache first token processing is counterintuitive / worse than in papers
#35185 commented on Jan 10, 2025 • 0 new comments
Saving model with shared tensors fails on cpu but succeeds on gpu
#33688 commented on Jan 10, 2025 • 0 new comments
Unknown quantization type, got fp8
#35471 commented on Jan 10, 2025 • 0 new comments