Documentation for SWAG contradicts itself when constructing the first sentence. #35095

bauwenst · 2024-12-05T00:34:12Z

System Info

Not relevant.

Who can help?

@stevhliu @ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

The docs for multiple choice use SWAG as an example, which is the task of selecting the next sentence given a context. Somewhat strangely, rather than being given in the format (sentence1, [sentence2a, sentence2b, sentence2c, sentence2d]), the dataset is given in the format (sentence1, sentence2_start, [sentence2_endA, sentence2_endB, sentence2_endC, sentence2_endD]).

The code given in the docs basically turns the dataset into the first format, where sentence 1 is kept intact and the start of sentence 2 is concatenated to each ending:

transformers/docs/source/en/tasks/multiple_choice.md

Lines 96 to 100 in a06a0d1

    
           ...     first_sentences = [[context] * 4 for context in examples["sent1"]] 
        
           ...     question_headers = examples["sent2"] 
        
           ...     second_sentences = [ 
        
           ...         [f"{header} {examples[end][i]}" for end in ending_names] for i, header in enumerate(question_headers) 
        
           ...     ]

Yet, the docs say:

transformers/docs/source/en/tasks/multiple_choice.md

Lines 85 to 88 in a06a0d1

    
           The preprocessing function you want to create needs to: 
        
           1. Make four copies of the `sent1` field and combine each of them with `sent2` to recreate how a sentence starts. 
        
           2. Combine `sent2` with each of the four possible sentence endings.

What is being described is formatting the dataset as (sentence1 sentence2_start, [sentence2_start sentence2_endA, sentence2_start sentence2_endB, sentence2_start sentence2_endC, sentence2_start sentence2_endD]), where there is overlap between the first and the second sentence (namely sentence2_start).

Expected behavior

Either the code is wrong or the description is wrong.

If the description is wrong, it should be:

The preprocessing function you want to create needs to:

Make four copies of the sent1 field.

Combine sent2 with each of the four possible sentence endings.

If the code is wrong, it should be:

    first_sentences = [[f"{s1} {s2_start}"] * 4 for s1,s2_start in zip(examples["sent1"], examples["sent2"])]
    second_sentences = [
        [f"{s2_start} {examples[end][i]}" for end in ending_names] for i, s2_start in enumerate(examples["sent2"])
    ]

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2024-12-05T12:46:24Z

cc @stevhliu

github-actions · 2025-01-04T08:03:06Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

bauwenst added the bug label Dec 5, 2024

github-actions bot closed this as completed Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation for SWAG contradicts itself when constructing the first sentence. #35095

Documentation for SWAG contradicts itself when constructing the first sentence. #35095

bauwenst commented Dec 5, 2024 •

edited

Loading

Rocketknight1 commented Dec 5, 2024

github-actions bot commented Jan 4, 2025

Documentation for SWAG contradicts itself when constructing the first sentence. #35095

Documentation for SWAG contradicts itself when constructing the first sentence. #35095

Comments

bauwenst commented Dec 5, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Dec 5, 2024

github-actions bot commented Jan 4, 2025

bauwenst commented Dec 5, 2024 •

edited

Loading