Return the metadata from a Query Engine in an Agent #15692
Replies: 2 comments 4 replies
-
To ensure that the metadata is always returned as a source when using an OpenAI Agent with a query engine, you need to set the def _configure_query_engine(
index: BaseIndex,
embed_model: OpenAIEmbedding
) -> BaseQueryEngine:
query_engine = index.as_query_engine(
embed_model=embed_model,
similarity_top_k=5,
include_metadata=True # This ensures metadata is included
)
query_engine.update_prompts({
TEXT_QA_TEMPLATE: create_synth_prompt_template()
})
return query_engine The Additionally, you can refer to the class CitationQueryEngine(BaseQueryEngine):
def __init__(
self,
retriever: BaseRetriever,
llm: Optional[LLM] = None,
response_synthesizer: Optional[BaseSynthesizer] = None,
citation_chunk_size: int = DEFAULT_CITATION_CHUNK_SIZE,
citation_chunk_overlap: int = DEFAULT_CITATION_CHUNK_OVERLAP,
text_splitter: Optional[TextSplitter] = None,
node_postprocessors: Optional[List[BaseNodePostprocessor]] = None,
callback_manager: Optional[CallbackManager] = None,
metadata_mode: MetadataMode = MetadataMode.NONE,
) -> None:
self.text_splitter = text_splitter or SentenceSplitter(
chunk_size=citation_chunk_size, chunk_overlap=citation_chunk_overlap
)
self._retriever = retriever
self._response_synthesizer = response_synthesizer or get_response_synthesizer(
llm=llm,
callback_manager=callback_manager,
)
self._node_postprocessors = node_postprocessors or []
self._metadata_mode = metadata_mode
super().__init__(callback_manager=callback_manager)
def _create_citation_nodes(self, nodes: List[NodeWithScore]) -> List[NodeWithScore]:
new_nodes: List[NodeWithScore] = []
for node in nodes:
text_chunks = self.text_splitter.split_text(
node.node.get_content(metadata_mode=self._metadata_mode)
)
for text_chunk in text_chunks:
text = f"Source {len(new_nodes) 1}:\n{text_chunk}\n"
new_node = NodeWithScore(
node=TextNode.model_validate(node.node), score=node.score
)
new_node.node.text = text
new_nodes.append(new_node)
return new_nodes
def retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
nodes = self._retriever.retrieve(query_bundle)
for postprocessor in self._node_postprocessors:
nodes = postprocessor.postprocess_nodes(nodes, query_bundle=query_bundle)
return nodes
def synthesize(
self,
query_bundle: QueryBundle,
nodes: List[NodeWithScore],
additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
) -> RESPONSE_TYPE:
nodes = self._create_citation_nodes(nodes)
return self._response_synthesizer.synthesize(
query=query_bundle,
nodes=nodes,
additional_source_nodes=additional_source_nodes,
) This example shows how to create citation nodes that include metadata, ensuring that sources are cited in the responses. You can adapt similar techniques to your query engine configuration to ensure metadata is always included and properly cited [1][2][3]. |
Beta Was this translation helpful? Give feedback.
-
@AlexanderKolev Hi, I'm trying to get the retrieved context from agent as well. Have you figured it out? I'm using ReAct agent ( https://docs.llamaindex.ai/en/stable/examples/agent/react_agent_with_query_engine/ ) and have not been able to get the retrieved context from the query engines. |
Beta Was this translation helpful? Give feedback.
-
Hello kind people,
I have an OpenAI Agent which uses an index as a query engine, as a tool. I want this tool to always return the metadata as a source, but I am struggling to do so. I have tried with prompting and using CitationQueryEngine but without success. Do you have any ideas for doing it 'programmatically' in some way? If I was not using an Agent but just using the query engine it seems more straightforward.
My code for the query engine looks like this:
Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions