Anthropic’s Post

View organization page for Anthropic, graphic

401,094 followers

We've added new features to the Anthropic Console: https://lnkd.in/ertrbfcU. Claude can generate prompts, create test variables, and show you the outputs of prompts side by side. Use Claude to generate input variables for your prompt. Then run the prompt to see Claude’s response. The new Evaluate tab enables you to automatically create test cases to evaluate your prompt against real-world inputs. Modify your test cases as needed, then run all of them in one click. We’ve also added the ability to compare the outputs of two or more prompts side by side. As you iterate on different versions of the prompt, your subject matter experts can compare responses and grade them on a 5-point scale. Test case generation and output comparison features are available to all users today.

82 Comments

Transcript

We've recently made a number of improvements to the Anthropic Workbench that make it easier to develop and deploy high-quality prompts for Claude. Let's see how it works by taking a look at our recently updated prompt generator. You can use the prompt generator to take a high-level description of a task and convert it into a detailed prompt template using Claude 3.5 Sonnet. In this case, let's imagine we need to triage customer support requests. As you can see, Claude immediately starts writing a prompt based off of our task. It's detailed and specific and looks like it should work. But, before we deploy it to production, we should really test to see how it performs with realistic customer data. Coming up with realistic test data can be time-consuming and it can take longer than writing the prompt itself. You can now use Claude to automatically generate realistic input data based off of your prompt. In this case, we can generate a customer support request. This one looks good, so let's see how the prompt works with this particular support request This seems pretty good. It's providing a justification and a triage decision. But how do we know that we didn't get lucky? How do we know that this prompt is actually going to work in a broad range of scenarios? That's where the new Evaluate feature comes in. You can use the Evaluate page to set up as many test cases as you want. Let's keep generating a broad range of representative test cases. You can also upload test cases from a CSV if you happen to have the test data in it. Test case generation logic is highly customizable and adapts to your existing test set. If you have highly specific requirements, you can directly edit the generation logic yourself. Once you have enough test cases ready, you can generate results for your new test suite. Alright, these results look pretty good, so let's go and grade their quality. Maybe we decide when we're evaluating them that we actually felt that the justifications were a little brief. We'd like them to be a bit longer. Well, we can go back to the prompt and find the section where it specified a one sentence justification and update it to a two sentence justification. We can rerun the prompt, and just as we'd hope, we're seeing a two sentence justification. So let's go back to the evaluate tab, and thankfully, our test suite is still there. So it can rerun the new prompt against the old test set data. And just as we hoped, they're all just a little bit longer. We can go and grade these new outputs. We're happier with these ones. But just to be sure, we can actually compare these new results against the old results. And here we can see, side by side, the results are longer. We're still getting similar triage decisions, but our grading, on average, is better.

Hernan Chiosso, CSPO, SPHR 💡

AI-powered HR Innovation Consultant, HRTech Product Manager & Fractional CHRO, helping orgs conquer people, product, process, and tech challenges. AI for HR. productizehr.substack.com. Remote work expert.

I think we're starting to see a change in the tide with GenAI tools, where things are evolving from tools to PLAY with AI, into tools that are more clearly designed to WORK with AI. As value creation opportunities are more clearly understood and use cases are more clearly articulated, tools evolve to become more useful and usable. Less magical AI, more explainable AI.

13 Reactions

Jeremy Rosenberg

Marketing, Brand, Communications & Generative AI Consultant, Trainer & Coach

Perhaps ‘prompt engineering’ won’t be the long-term career cert some were selling it as, just months ago…

7 Reactions

Luca Biasi

SEO Specialist

Good morning, I followed the procedure to receive the 5 credits to try it but after providing the Italian number it didn't work. Is this feature available in Italy? Is it possible to use this tool unlimitedly in the paid version?? Thank you very much and congratulations for the hard work and results you bring us.

2 Reactions

Cyfuture

The Evaluate tab for creating and modifying test cases is a game-changer. Excited to see how these features elevate user experience!

1 Reaction

Ray Villalobos

Generative AI, Prompt Engineering and Full Stack Development. LinkedIn Top Voice. Senior Staff Instructor at LinkedIn, Instructor at Stanford University.

Great, but now Im getting content generation warnings for a simple breakout game

4 Reactions

Armin Stroß-Radschinski

Product Designer, Entrepreneur and CEO of acsr industrialdesign

#UX matters when using prompt based data & enlightment engineering. #insight and having a personal repository of your research history (#prompts and #result evalulation) is a rare feature in current mainstream interface implementations (there is too much #play and often no #ground) so this may end up in the demand for more #playground …

A.I /Circulartextapps Group

Empowering User Experiences: Transforming Circulartext Development Group into a Non-Profit for A.I.-Driven Innovation

I just shared an exciting idea about a significant technological advancement for chatbots, which could be implemented in either Claude or GPT. Now, I'm anxiously waiting to see which platform will adopt the concept first. The revelation actually came to me while using Claude, and I'm eager to see it in action during my daily AI tasks. However, I can't shake the feeling that it's already being developed somewhere else.

2 Reactions

Armaan Ahmed

Board of Directors @ Business Brilliance | Chemistry CS at UPenn

A great way to democratize the prompting process for those who aren't as familiar with prompt engineering techniques. Even beyond this, "Evaluate" is a very good tool to have

3 Reactions

Simon McCallum

Senior Lecturer Victoria University of Wellington and Førsteamanuensis at NTNU

This is very useful. I was doing something similar in python with API access but making it available in the interface directly is great.

3 Reactions

James Lal

CEO & Founder @ Little Bear Labs | AI | Full stack development agency

The console ( particularly the prompt gen ) has been really useful

2 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

Erik Blas

Founder // CEO MakeIntellex // Cloud and AI leader // Accelerating adoption, collaboration, and innovation utilizing high level AI and Cloud Architecture
3w
Report this post
Great new features! I can see the growth of action potential well learned and planned out, as well as a good way to review it before accepting new features in your development.

Anthropic

401,094 followers
3w

We've added new features to the Anthropic Console: https://lnkd.in/ertrbfcU. Claude can generate prompts, create test variables, and show you the outputs of prompts side by side. Use Claude to generate input variables for your prompt. Then run the prompt to see Claude’s response. The new Evaluate tab enables you to automatically create test cases to evaluate your prompt against real-world inputs. Modify your test cases as needed, then run all of them in one click. We’ve also added the ability to compare the outputs of two or more prompts side by side. As you iterate on different versions of the prompt, your subject matter experts can compare responses and grade them on a 5-point scale. Test case generation and output comparison features are available to all users today.
Like Comment
To view or add a comment, sign in
Ed Brandman

Officially un-retired 😀
3w Edited
Report this post
This is very cool tech and immediately beneficial to anyone experimenting with Generative AI. Anthropic has made this accessible from within the Claude App. We find it’s necessary to constantly evaluate what OpenAI is deploying in ChatGPT (including their new MacOS app), Anthropic is deploying in Claude and Perplexity is deploying with their AI search tools. Testing prompts, understanding the latest on SOTA vision models, pre-configuring variables and evaluating results is a key part of mastering generative AI. At DiligentIQ we view tools like this as complimentary to our proprietary platform where we have developed specialized encoding / embedding methods to deeply understand complex DD documents. You can’t be all things to all people and be good at specialist functions at the same time. Leverage the best of breed tools that are rapidly evolving in the marketplace to get better and iterate on your evals and your ideas. It’s a great way to refine use case testing and make progress on production-ready solutions. Lastly it can assist you in deciding what to buy vs build.

Anthropic

401,094 followers
3w

We've added new features to the Anthropic Console: https://lnkd.in/ertrbfcU. Claude can generate prompts, create test variables, and show you the outputs of prompts side by side. Use Claude to generate input variables for your prompt. Then run the prompt to see Claude’s response. The new Evaluate tab enables you to automatically create test cases to evaluate your prompt against real-world inputs. Modify your test cases as needed, then run all of them in one click. We’ve also added the ability to compare the outputs of two or more prompts side by side. As you iterate on different versions of the prompt, your subject matter experts can compare responses and grade them on a 5-point scale. Test case generation and output comparison features are available to all users today.
Like Comment
To view or add a comment, sign in
NAITIVE

14 followers
2w
Report this post
Demo from our last post

Anthropic

401,094 followers
3w

We've added new features to the Anthropic Console: https://lnkd.in/ertrbfcU. Claude can generate prompts, create test variables, and show you the outputs of prompts side by side. Use Claude to generate input variables for your prompt. Then run the prompt to see Claude’s response. The new Evaluate tab enables you to automatically create test cases to evaluate your prompt against real-world inputs. Modify your test cases as needed, then run all of them in one click. We’ve also added the ability to compare the outputs of two or more prompts side by side. As you iterate on different versions of the prompt, your subject matter experts can compare responses and grade them on a 5-point scale. Test case generation and output comparison features are available to all users today.
Like Comment
To view or add a comment, sign in
Javed Alam

Professor emeritus at Youngstown State University
3w
Report this post
It seems that what is now experimental tooling to effectively used Gen AI models, will eventually become part of the model itself. For now prompt creation, function calling and code execution as a tool is getting added to all major LLM models, while audio and video are becoming first order input to Model like Gemini. The long context input window is the key to all this.

Anthropic

401,094 followers
3w

We've added new features to the Anthropic Console: https://lnkd.in/ertrbfcU. Claude can generate prompts, create test variables, and show you the outputs of prompts side by side. Use Claude to generate input variables for your prompt. Then run the prompt to see Claude’s response. The new Evaluate tab enables you to automatically create test cases to evaluate your prompt against real-world inputs. Modify your test cases as needed, then run all of them in one click. We’ve also added the ability to compare the outputs of two or more prompts side by side. As you iterate on different versions of the prompt, your subject matter experts can compare responses and grade them on a 5-point scale. Test case generation and output comparison features are available to all users today.
Like Comment
To view or add a comment, sign in
Melanie M.

Chief AI Officer & Trusted Advisor | Strategic Planning and Development,| AI & Automation, Emerging Technologies, Innovation Specialization, Business Process, and Ethical AI
3w
Report this post
Introducing the next level. CAIO, BPO, BPM, Product Managers, Tech leads, ML engineers, Data scientists, AI developers, Innovation strategists... Take note: Anthropic's new console features are set to redefine AI prompt engineering: New features: Claude-Powered Prompt Generation: Seamlessly create effective prompts tailored to your needs. Comprehensive Test Suite: Evaluate prompts using diverse inputs and outputs for robust results. Enhanced Iteration & Comparison: Efficiently refine and compare prompts to ensure optimal performance. These tools are designed to streamline and elevate your AI projects. Explore these groundbreaking updates and take your AI capabilities to the next level.

Anthropic

401,094 followers
3w

We've added new features to the Anthropic Console: https://lnkd.in/ertrbfcU. Claude can generate prompts, create test variables, and show you the outputs of prompts side by side. Use Claude to generate input variables for your prompt. Then run the prompt to see Claude’s response. The new Evaluate tab enables you to automatically create test cases to evaluate your prompt against real-world inputs. Modify your test cases as needed, then run all of them in one click. We’ve also added the ability to compare the outputs of two or more prompts side by side. As you iterate on different versions of the prompt, your subject matter experts can compare responses and grade them on a 5-point scale. Test case generation and output comparison features are available to all users today.
Like Comment
To view or add a comment, sign in
Dr. Joyjit Chatterjee

Forbes 30U30 Europe | Data Scientist- CI @ Reckitt, UK | Green Talents Awardee (German Govt.) | PhD & Postdoc in AI (UofHull) | E&C Engineer (Gold Medalist) | UKRI Endorsed Global Talent | AI for Sustainability
3w
Report this post
This is game-changing. Just like Weights & Biases, LangChain (and LangSmith) have proved to be excellent frameworks for debugging/validating performance of LLMs during training/fine-tuning and deployment, writing prompts for complex business problems can be really challenging, and so can be testing and debugging the code/content Gen AI models generate by covering exhaustive test cases. This provision in Claude will lend itself as a powerful tool in the hands of SMEs and developers, particularly as it is so easy to use with such a brilliant UI and all we need is the right domain knowledge and natural language to make a mark in building powerful applications with Claude! Really impressive Anthropic ! #artificialintelligence #ai #genai #deeplearning #claude

Anthropic

401,094 followers
3w

We've added new features to the Anthropic Console: https://lnkd.in/ertrbfcU. Claude can generate prompts, create test variables, and show you the outputs of prompts side by side. Use Claude to generate input variables for your prompt. Then run the prompt to see Claude’s response. The new Evaluate tab enables you to automatically create test cases to evaluate your prompt against real-world inputs. Modify your test cases as needed, then run all of them in one click. We’ve also added the ability to compare the outputs of two or more prompts side by side. As you iterate on different versions of the prompt, your subject matter experts can compare responses and grade them on a 5-point scale. Test case generation and output comparison features are available to all users today.
Like Comment
To view or add a comment, sign in
Manjeet Singh

Sr Director, AI Platform @Salesforce | Fittest Product Exec | Top Voice in AI | B2B SaaS Advisor | Ex VP IRM, ServiceNow, Startups
2w
Report this post
Anthropic is doing brilliant work on their console. The idea of automating the process of auto generating prompts, creating test data using LLM, evaluating, comparing, and optimizing prompts saves a ton of time. While the generated prompts may not be perfect it's good to have a starting point on which you can quickly iterate. Generating test cases is also helpful as you may not have data sitting around to test on. Just be careful of biases as with any other synthetic data generated by LLMs. The prompt evaluation feature is my favorite in this release. You can compare prompts and score the quality of responses from various prompt versions. #promptengineering #prompt #llm

Anthropic

401,094 followers
3w

We've added new features to the Anthropic Console: https://lnkd.in/ertrbfcU. Claude can generate prompts, create test variables, and show you the outputs of prompts side by side. Use Claude to generate input variables for your prompt. Then run the prompt to see Claude’s response. The new Evaluate tab enables you to automatically create test cases to evaluate your prompt against real-world inputs. Modify your test cases as needed, then run all of them in one click. We’ve also added the ability to compare the outputs of two or more prompts side by side. As you iterate on different versions of the prompt, your subject matter experts can compare responses and grade them on a 5-point scale. Test case generation and output comparison features are available to all users today.

3 Comments
Like Comment
To view or add a comment, sign in
Grant Swanson

Head of Demand Generation at Observe Inc.
3mo
Report this post
See how Observe, Inc. used this dashboard to quickly discover that one of the collector components was recently set to a high verbosity logging mode for an investigation and was left in that state. Having debug messages flood the logs for a long time is not very useful. We quickly turned it off which reduced the incoming log volume from that component by ~40% without losing any signal. Learn more from our latest blog post: Observability at Observe – Optimizing Usage and Costs 👉 https://lnkd.in/gQPvfuJb
1 Comment
Like Comment
To view or add a comment, sign in

401,094 followers

View Profile Follow

Anthropic’s Post

Transcript

More Relevant Posts

Explore topics