AI Technical Design Decisions: Are You Summarizing or Question Answering

Co Authors: Philippe Beaudette, VP - Community and Michael Foree - Director, Data Science and Data Platform.

We know we are using terms to describe our projects that may not be 100% technically accurate - we’re leaning more towards simplicity in our explanations at this moment and expect our descriptions to evolve as we, the industry and users, learn more.

We are sharing an example of how different technology designs can change the output of AI - in this example, either Summarization or Question Answering.

As we research and build our AI solutions, we’ll solicit input from the community and users to understand how we might better explain the methods used under the hood.

Stack Overflow, like the rest of the world, has been spending a lot of time learning about AI and GenAI over the past few months as we build the solutions we announced today at WeAreDevelopers.

In our announcements, we speak about people getting summarized outputs from AI. In making announcements like these, our Marketing colleagues have to find a balance between explaining our technical solutions simply and explaining them precisely. At this moment, we are leaning more towards simplicity in our explanations and expect our descriptions to evolve as we, the industry and users, learn more.

With AI, the behind-the-scenes technical design decisions - e.g., how we decide to send information to an LLM - can produce a very different user experience and output.

Let’s look at the difference between summarizing and question answering.

Summarization is the task of creating a shorter version of one or more documents that captures all the important information.

Question answering (more specifically, generative question answering) generates a response that correctly answers the user's question.

Both may sound familiar but produce a very different user experience simply because summarization is unaware of the user's query while question answering is.

Below are examples highlighting the differences between the tasks using Stack Overflow posts or an Article.

Note: All examples use a technique known as Retrieval Augmented Generation (RAG), which allows us to leverage our data instead of relying on the knowledge the LLM has of the world.

The following example uses a public Stack Overflow post, but the same logic applies to Stack Overflow for Teams posts as they both are in a Q&A structure.

Let's say a user has a query asking, "How to wait for a minute in Python?". Our search results say, How do I get my program to sleep for 50 milliseconds? is the correct and highest-quality post related to the user's query.

For question answering, we provide the LLM with two things. The information, in our case, is the accepted answer for the Stack Overflow post. And the question, which is the user's search query.

Prompt

Use time.sleep()

from time import sleep
sleep(0.05)
----------------
Using only the above information, answer the following question.
Include a concise explanation and code snippets if necessary. 
----------------
Question: How to wait for a minute in Python?

Generation

Answer: You can use the time to wait for a minute in Python.sleep() function. 
This function takes a single argument: the number of seconds to wait. 
To wait for a minute, you would use the following code:

from time import sleep
sleep(60)

Notice how we have used the existing Stack Overflow answer about sleeping for 50 milliseconds as a reference, but the LLM knows the user is asking for a minute instead. Hence, it adjusts the answer to match the original search query.

For summarization, we still provide the LLM with two things. The question, now the title of the Stack Overflow post, which was the post returned by search, not the user's search query. Then we provide an answer, still the accepted answer for the Stack Overflow post.

Prompt

Summarize the following question and answer. Be concise and use code snippets if necessary.
----------------
Question: How do I get my program to sleep for 50 milliseconds?

Answer: Use time.sleep()
from time import sleep
sleep(0.05)

Generation

To make a program sleep for 50 milliseconds, use the time.sleep() function from the time module and pass in 0.05 as an argument.

Notice how we are just re-writing the existing answer. The LLM is only aware of the original question posters' question about "50 milliseconds" and has no knowledge of the search query, which asks for "1 minute".

The next example shows how the experience is even more different when dealing with an Article or any other long-form document.

Let's say we have a query asking, "How much is github copilot for students?" and an article chunk was returned by search.

Similar to above, we just pass some information, which is the chunk of text from the article, and the question, which is still the search query.

Prompt

{CHUNK}

-----
Using only the above information, answer the following question.
Include a concise explanation and code snippets if necessary.
----

Question: How much is github copilot for students?

Generation

GitHub Copilot is free for students and maintainers of popular open-source repositories.
There is no cost associated with using GitHub Copilot for these users.

But for summarization, we only pass the information from the article. The is no question for the LLM to answer because the article is a long-form document, not structured as Q&A.

Prompt

Summarize the following information. Be concise and use code snippets if necessary.

----

{CHUNK}

Generation

GitHub Copilot is a paid feature for individual developers, priced at $10/developer/month or $100/year.
It is free for students and maintainers of popular open-source repositories.
For GitHub Enterprise Cloud customers who are not paying via credit card or PayPal, GitHub Copilot for Business is priced at $19/user/month.
Companies who are on GitHub Free, GitHub Team, or pay for GitHub Enterprise via credit card or PayPal can join a waitlist for GitHub Copilot for Business.

With summarization, the LLM does not know we care only about "students", so it generates much more text. We still can see that "It is free for students and maintainers," but it is still on the user to find this information in the generation, and there is no guarantee that this information was going to be retained.

While both summarization and question-answering can be extremely helpful, it's important to acknowledge their differences. Summarization has no knowledge of the search query, while question-answering does. Neither is better than the other, just different implementations and experiences.

Summarization is less likely to hallucinate and produce inappropriate content because the user does not influence the generation. Still, there is the possibility that they need help getting the answer they are looking for.

Question answering is more likely to answer the user's query and get them unstuck. It is more likely, however, to hallucinate and produce inappropriate content because the user has some influence on the generation. This risk can always be reduced but is still present.

As we research and build our AI solutions, we’ll solicit input from the community and users to understand how we might better explain the methods used under the hood.

AI Technical Design Decisions: Are You Summarizing or Question Answering

Differences when using Stack Overflow posts

Question answering

Summarization

Differences when using Articles or long-form documents

Question answering

Summarization

Conclusion

About Labs

Our guiding principles

Differences when using Stack Overflow posts

Question answering

Summarization

Differences when using Articles or long-form documents

Question answering

Summarization

Conclusion

Table of contents

About Labs

Our guiding principles