Unlock the higher accuracy of Claude 3 with Anthropic’s new advanced contextual search feature

The Anthropic development team responsible for creating the large Claude 3 AI language models has unveiled a groundbreaking new search engine known as contextual search. This innovative approach aims to significantly improve the performance and accuracy of traditional Retrieval-Augmented Generation (RAG) systems by intelligently embedding additional contextual information into document fragments. The main goal of contextual retrieval is to increase the precision and relevance of retrieved data, especially for complex queries where conventional methods often struggle to maintain context and fail to deliver accurate results.

Contextual download

TL;DR Key takeaways:

Anthropic introduces contextual search to enhance RAG systems by embedding additional context into document fragments.
Contextual search increases relevance and accuracy, especially for complex queries.
Traditional RAG methods have problems preserving context, resulting in less accurate results.
Contextual search uses the Large Language Model (LLM) to add 50–100 context tokens to each fragment.
Performance improvements include a 35% reduction in the top 20 fragment download failure rate and a 49% reduction when connecting to contextual BM25.
Key implementation factors: sharding strategy, embedding model selection, contextualizer hint tuning, and fragment number optimization.
Fast caching can reduce costs and latency, balancing cost and performance.
Works best with large knowledge bases (>200,000 tokens); for smaller databases it may be beneficial to attach entire documents to prompts.
Anthropic provides a code example implementing contextual search.

How Contextual Search Works

At its core, contextual search improves on the traditional RAG process by strategically embedding additional context into each document fragment. This critical improvement ensures that retrieved information is more relevant and precise by effectively maintaining the surrounding context within each individual fragment.

In a conventional RAG setting, documents are split into distinct fragments and embeddings are computed for each of these fragments. The resulting embeddings are then stored in a vector database for efficient retrieval. During the inference process, relevant fragments are retrieved based on the similarity of their embedding to the query. While this method can be effective in many scenarios, it often struggles to fully preserve the original context, especially for more complex and nuanced queries.

One of the most serious challenges with traditional RAG is the potential loss of valuable contextual information when fragments are retrieved in isolation. This loss of context can seriously hamper the retrieval of specific, targeted information in complex queries, ultimately leading to less accurate and relevant results.

Sometimes the simplest solution is the best. If your knowledge base is less than 200,000 tokens (about 500 pages of material), you can simply include the entire knowledge base in the prompt you give the model, without the need for RAG or similar methods. Kaplan, more on this method using Prompt Caching?

Anthropic Contextual Recovery Explained

Below you will find a selection of other articles from our extensive library of content that may interest you on the subject of Claude 3:

Implementing Contextual Search

To effectively overcome these challenges, Anthropic contextual search uses a sophisticated method Large Language Model (LLM) to automatically add rich contextual information to each fragment. This process involves using a carefully crafted prompt that places each individual fragment in the overall context of the document, typically adding 50 to 100 tokens of highly relevant context. By including this additional contextual information, the retrieval process is able to preserve the integrity and meaning of the information much more effectively, resulting in greatly improved retrieval accuracy.

The performance improvement achieved with contextual embedding is truly impressive. By using this technique, the failure rate of retrieving the top 20 fragments is reduced by a significant 35%. When contextual embedding is combined with contextual BM25, an even more dramatic improvement is observed, with the failure rate dropping by a remarkable 49%. Furthermore, adding a re-ranker component to the system further improves the overall retrieval accuracy, making the entire system more robust, reliable, and efficient in delivering highly relevant results.

Key implementation issues

When implementing contextual search in practice, there are several key factors to carefully consider to ensure optimal performance and results:

Fragmentation Strategy:The specific sharding strategy and determining appropriate shard boundaries will depend on the unique requirements and characteristics of each application.
Deposition model:The choice of deposition model is crucial to achieve the best possible results. Dense deposition models such as Gemini and Voyage are highly recommended due to their excellent performance.
Contextualizer prompt:Customizing the contextualizer prompt based on the specific documents being processed is essential to maximizing the relevance and accuracy of the information being retrieved.
Number of pieces:Optimizing the number of fragments to return is an important factor. Research shows that returning around 20 fragments typically produces the most effective results.

It is important to note that adding contextual information increases the overall number of tokens and processing overhead. However, strategic use of fast caching can significantly reduce costs and latency, making the system much more efficient and cost-effective.

Practical applications

Contextual retrieval is particularly suitable for large knowledge bases, especially those exceeding 200,000 tokens in size. For smaller knowledge bases, it may be more efficient to include the entire document in the prompt, as this approach reduces the need for extensive sharding and contextual embedding.

To help implement contextual fetching, Anthropic provides a comprehensive code example that demonstrates the key steps involved in the process. This example includes detailed instructions for creating vector databases, calculating embeddings, and evaluating performance, serving as a practical guide for developers who want to leverage the power of this advanced fetching mechanism in their own applications.

By leveraging Anthropic’s innovative contextual search capabilities, organizations can unlock new levels of accuracy, relevance, and efficiency in their information retrieval systems. This groundbreaking approach represents a significant advance in the field, enabling companies to extract more value from their data and deliver more meaningful insights to their users.

Source: Prompt Engineering

Filed under: AI, Top News

Geeky Gadgets Latest Deals

Disclosure: Some of our articles contain affiliate links. If you purchase something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn more about our Disclosure Policy.