Artificial Intelligence Makes Compelling Connections Puzzles

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceArtificial IntelligenceBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersPodcastsSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, podcasts, and infographics inform our readers about developments in technology, engineering, and science.
Join IEEE
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2024 IEEE — All rights reserved. A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

Millions of people log on every day for the latest edition of Connections, a popular category-matching game from The New York Times. Launched in mid-2023, the game garnered 2.3 billion plays in the first six months. The concept is straightforward yet captivating: Players get four tries to identify four themes among 16 words.

Part of the fun for players is applying abstract reasoning and semantic knowledge to spot connecting meanings. Under the hood, however, puzzle creation is complex. New York University researchers recently tested the ability of OpenAI’s GPT-4 large language model (LLM) to create engaging and creative puzzles. Their study, published as a preprint on arXiv in July, found LLMs lack the metacognition necessary to assume the player’s perspective and anticipate their downstream reasoning—but with careful prompting and domain-specific subtasks, LLMs can still write puzzles on par with The New York Times.

gray boxes with black text on left, yellow, green, blue and purple boxes on right with black text Each Connections puzzle features 16 words (left) that must be sorted into 4 categories of 4 words each (right).The New York Times

“Models like GPT don’t know how humans think, so they’re bad at estimating how tricky a puzzle is for the human brain,” says lead author Timothy Merino, a Ph.D. student in NYU’s Game Innovation Lab. “On the flip side, LLMs have a very impressive linguistic understanding and knowledge base from the massive amounts of text they train on.”

The researchers first needed to understand the core game mechanics and why they’re engaging. Certain word groups, like opera titles or basketball teams, might be familiar to some players. However, the challenge isn’t just a knowledge check. “[The challenge] comes from spotting groups with the presence of misleading words that make their categorization ambiguous,” says Merino.

Intentionally distracting words serve as red herrings and form the game’s signature trickiness. In developing GPT-4’s generative pipeline, the researchers tested whether intentional overlap and false groups resulted in tough yet enjoyable puzzles.

yellow, green, blue and purple boxes with black text and black arrows A successful Connections puzzle includes intentionally overlapping words (top). The NYU researchers included a process for generating new word groups in their LLM approach to making Connections puzzles (bottom).NYU

This mirrors the thinking of Connections creator and editor Wyna Liu, whose editorial approach considers “decoys” that don’t belong to any other category. Senior puzzle editor Joel Fagliano, who tests and edits Liu’s boards, has said that spotting a red herring is among the hardest skills to learn. As he puts it, “More overlap makes a harder puzzle.” (The New York Times declined IEEE Spectrum’s request for an interview with Liu.)

The NYU paper cites Liu’s three axes of difficulty: word familiarity, category ambiguity, and wordplay variety. Meeting these constraints is a unique challenge for modern LLM systems.

AI Needs Good Prompts for Good Puzzles

The team began by explaining the game rules to the AI model, providing examples of Connections puzzles, and asking the model to create a new puzzle.

“We discovered that it’s really hard to write an exhaustive ruleset for Connections that GPT could follow and always produce a good result,” Merino says. “We’d write up a big set of rules, ask it to generate some puzzles, then inevitably discover some new unspoken rule we needed to include.”

Despite making the prompts longer, the quality of the results didn’t improve. “The more rules we added, the more GPT seemed to ignore them,” Merino adds. “It’s hard to adhere to 20 different rules and still come up with something clever.”

The team found success by breaking the task into smaller workflows. One LLM creates puzzles based on iterative prompting, a step-by-step process that generates one or many word groups in a single context, which are then parsed into separate nodes. Next, an editor LLM identifies the connecting theme and edits the categories. Finally, a human evaluator picks the highest-quality sets. Each LLM agent in the pipeline follows a limited set of rules without needing an exhaustive explanation of the game’s intricacies. For instance, an editor LLM only needs to know the rules for category naming and fixing errors, not the gameplay.

To test the model’s appeal, the researchers collected 78 responses from 52 human players, who compared LLM-generated sets to real Connections puzzles. Those surveys confirmed that GPT-4 could successfully produce novel puzzles comparable in difficulty and competitive in players’ preferences.

bar graph with blue and purple colors and black text In about half of the comparisons against real Connections puzzles, human players rated AI-generated versions as equally or more difficult, creative, and enjoyable.NYU

Greg Durrett, an associate computer science professor at the University of Texas at Austin, calls NYU’s study an “interesting benchmark task” and fertile ground for future work on understanding set operations like semantic groupings and solutions.

Durrett explains that while LLMs excel at generating various word sets or acronyms, their outputs may be trite or less interesting than human creations. He adds, “The [NYU] researchers did a lot of work to come up with the right prompting strategies to generate these puzzles and get high-quality outputs from the model.”

NYU Game Innovation Lab Director Julian Togelius, an associate professor of computer science and engineering who co-authored the paper, says the group’s task assignment workflow could carry over to other titles such as Codenames, a popular multiplayer board game. Like Connections, Codenames involves identifying commonalities between words. “We could probably use a very similar method with good results,” Togelius adds.

While LLMs may never match human creativity, Merino believes they’ll make excellent assistants for today’s puzzle designers. Their training knowledge unlocks vast word pools. For instance, GPT can list 30 shades of green in seconds, while humans might need a minute to think of a few.

“If I wanted to create a puzzle with a ‘shades of green’ category, I would be limited to the shades I know,” Merino says. “GPT told me about ‘celadon,’ a shade I didn’t know about. To me, that kind of sounds like the name of a dinosaur. I could ask GPT for 10 dinosaurs with names ending in ‘-don’ for a tricky follow-up group.”

From Your Site Articles

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

This AI Model Can Make Creative Connections Puzzles

Inspired by The New York Times' hit, researchers put an LLM to the task

AI Needs Good Prompts for Good Puzzles

Bluetooth Microscope Reveals the Inner Workings of Mice

Why the Nobel Prize in Physics Went to AI Research

Tips for Improving Workplace Communication Skills

Related Stories

Cloudflare's AI Audit Helps Websites Beat Bots

Can AI Talk People Out of Conspiracy Theories?

California's "AI Safety" Bill Will Have Global Effects

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

This AI Model Can Make Creative Connections Puzzles

Inspired by The New York Times' hit, researchers put an LLM to the task

AI Needs Good Prompts for Good Puzzles

Bluetooth Microscope Reveals the Inner Workings of Mice

Why the Nobel Prize in Physics Went to AI Research

Tips for Improving Workplace Communication Skills

Related Stories

Cloudflare's AI Audit Helps Websites Beat Bots

Can AI Talk People Out of Conspiracy Theories?

California's "AI Safety" Bill Will Have Global Effects