Anthropic released its latest and greatest LLM, Claude 3.5 Sonnet, last week. It has a cool feature called Artifacts where it can run the code it generates and show the results side-by-side (i.e., we don't have to copy-paste the code somewhere else). People are creating fun interactive apps with it and inspired by something I saw, I asked it to generate a T-test calculator app. My journey is chronicled in the attached images 🙂. PSA: - LLMs like Claude are amazing (and the Artifacts feature is very nice) but please check the output, especially for questions where correctness is important. Just because pages of code and verbiage were involved in generating the answer doesn't mean it will be correct. - The error in my example was glaring but what if it was subtly wrong? So use LLMs in areas where you are knowledgeable (since you will be able to check the output). Be careful in areas where you are a novice.
Critical thinking is more valuable than ever nowadays. Unfortunately, I fear that many people take LLM-generated responses at face value and such behavior will provoke dramatic changes in day-to-day debates.
Valid point! Tom's Guide conducted an advanced test of two LLMs this week, and Claude was the clear winner. It's incredibly impressive!" https://www.tomsguide.com/ai/chatgpt-4o-vs-claude-35-sonnet-which-ai-platform-wins
Can Claude 3.5 Sonnet handle complex multi-step workflows?
Awesome
Thank you for sharing and for the great advice on the output generated by LLMs. Being in the pharma industry, it's crucial that the information we share externally is 100% accurate. Human verification of the content generated by LLMs is essential. However, having a system where experts can verify the output can be a significant asset. This would provide an excellent starting point, saving us long hours of effort.