Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gsm8k cleanup #6

Merged
merged 4 commits into from
Dec 14, 2023
Merged

Conversation

nking-1
Copy link
Collaborator

@nking-1 nking-1 commented Dec 14, 2023

Fixes needed to make the gsm8k benchmark run against Azure OpenAI GPT-4 chat model.

Also contains some additional small changes:

  • Removes the global prompt list in the gsm8k module, making it a scoped variable instead. We'll be doing something similar for the other scripts that followed this pattern.
  • Makes generations/ directory which will be used to store the intermediate outputs moving forward
  • Fixes a log file encoding issue
  • Adds "bigbench" into the valid datasets list

@@ -15,7 15,7 @@
"url": os.getenv("AZURE_OPENAI_EMBEDDINGS_URL"),
},
"azure": {
"headers": {"Authorization": f"Bearer {os.getenv('AZURE_OPENAI_CHAT_API_KEY')}"},
"headers": {"api-key": f"{os.getenv('AZURE_OPENAI_CHAT_API_KEY')}"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we document anywhere exactly which values should go into these environment variables? I'm guessing that this is the key which users can get from the AOAI playground if they select 'View Code' ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the right key. It's also visible from a keys panel in the portal. We will document necessary environment variables in the readme.

@@ -8,16 8,6 @@

my_path = pathlib.Path(__file__).parent.resolve()

ds = load_dataset("gsm8k", "main")["test"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of not doing this on import :-)

@@ -67,17 57,26 @@ def solve(idx):
break

if answer:
with open(my_path.parent / "datasets" / "gsm8k.jsonl", "a") as f:
with open(my_path.parent / "generations" / "gsm8k.jsonl", "a") as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the 'generations' directory be added to gitignore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should. I'll add that. I do want to keep the readme in there to document the purpose of that directory though.

Copy link
Collaborator

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but this looks like an improvement to me

@nking-1 nking-1 merged commit 5d47891 into microsoft:main Dec 14, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants