Yan Wang
New York, New York, United States
1K followers
500 connections
About
Software Engineer at Bloomberg.
Experience
Education
Honors & Awards
-
1st Place of ACM-ICPC Mid-Central Regional, Tennessee, 2019
Association for Computing Machinery
-
2nd Place of ACM-ICPC Mid-Central Regional, Tennessee, 2018
Association for Computing Machinery
-
Dean's List
Vanderbilt University
All semesters
View Yan’s full profile
Other similar profiles
-
Xiao Song
Beijing, ChinaConnect -
Rahul Mittal
Data Engineer @ Dell | UT'26
Austin, TXConnect -
Haohan Wang
Champaign, ILConnect -
Erfan Eshratifar
Buffalo, NYConnect -
Xikun Zhang
United StatesConnect -
Gagan Somashekar
Redmond, WAConnect -
Hang Jiang
San Francisco Bay AreaConnect -
Lucy Zhu
Sunnyvale, CAConnect -
Yashvardhan Jain
Research Engineer - Machine Learning at Cyberinfrastructure for Network Science Center at Indiana University
Bloomington, INConnect -
Yile Gu
UW PhD in Computer Science and Engineering
Seattle, WAConnect -
Tu Lan
Ph.D. Student at Zhejiang University, ZJU-UIUC Institute
HainingConnect -
Sabhya Chhabria
San Francisco Bay AreaConnect -
elvis kahoro
San Francisco, CAConnect -
Hanhan Zhou
Santa Clara, CAConnect -
Nikhil Mehta
Mountain View, CAConnect -
Omid Poursaeed
New York, NYConnect -
Xiangyi Chen
Mountain View, CAConnect -
Hanchao Yu
Menlo Park, CAConnect -
Tootiya Giyahchi
Irvine, CAConnect -
Sai Teja Karnati
Orlando, FLConnect
Explore more posts
-
Scale AI
LLMs have become more capable with better training and data. But they haven’t figured out how to “think” through problems at test-time. The latest research from Scale finds that simply scaling inference compute–meaning, giving models more time or attempts to solve a problem–is not effective because the attempts are not diverse enough from each other. 👉 Enter PlanSearch, a novel method for code generation that searches over high-level "plans" in natural language to encourage response diversity. PlanSearch enables the model to “think” through various strategies before generating code, making it more likely to solve the problem correctly. The Scale team tested PlanSearch on major coding benchmarks (HumanEval , MBPP , and LiveCodeBench) and found it consistently outperforms baselines, particularly in extended search scenarios. Overall performance improves by over 16% on LiveCodeBench from 60.6% to 77%. Here’s how it works: ✅ PlanSearch first generates high-level strategies, or "plans," in natural language before proceeding to code generation. ✅ These plans are then further broken down into structured observations and solution sketches, allowing for a wider exploration of possible solutions. This increases diversity, reducing the chance of the model recycling similar ideas. ✅ These plans are then combined before settling on the final idea and implementing the solution in code. Enabling LLMs to reason more deeply at inference time via search is one of the most exciting directions in AI right now. When PlanSearch is paired with filtering techniques—such as submitting only solutions that pass initial tests—we can get better results overall and achieve the top score of 77% with only 10 submission attempts. Big thanks to all collaborators on this paper including: Evan Wang, Hugh Zhang, Federico Cassano, Catherine Wu, Yunfeng Bai, William Song, Vaskar Nath, Ziwen H., Sean Hendryx, Summer Yue 👉 Read the full paper here: arxiv.org/abs/2409.03733
1752 Comments -
Union.ai
Building a RAG Batch Inference Pipeline with Anyscale and Flyte 🚀 Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads—from data processing, training, and tuning to model serving. Flyte is an open-source orchestrator that facilitates building production-grade data and machine learning pipelines. Flyte makes orchestrating these pipelines easy. In the RAG Batch Inference pipeline example there are two pipelines: 📊 Embedding Generation Pipeline 📦 Batch Inference Pipeline Anyscale’s Ray platform optimizes the execution of these pipelines, helping deliver leading performance and cost efficiency. This blog showcases the versatility of Ray by demonstrating embedding generation and LLM batch inference with Ray in two Flyte pipelines. Read the post on Anyscale: 🔗 https://lnkd.in/gjDKRPmu
8 -
The Linux Foundation
Lin Quiao wanted to merge PyTorch with Caffe2 in a 'zipper approach', but the APIs didn’t align. Instead, their team switched to focus entirely on #PyTorch and helped make it a high-performance, production-ready framework. Discover more at #PytorchConf 2024, running September 18-19 in San Francisco. Register now: https://hubs.la/Q02NqL230 #Pytorch #opensource #PyTorchConf
14 -
Matthew Yeseta
Seeking a full time onsite hybrid AI Architect Engineer Manager for leading RAG and LLM Generative AI role to relocate and lead RLHF, prompt engineering for zero shot learning, one shot learning, few short learning. Engineer Architect or Manage and lead teams on Fine Tuning LLM models on PEFT, RLFL, Prompt Tuning context window, and to work on Reward model using LoRA, and to work on agent to instruct LLM on Context window and lead AI Generative LLM RAG performance for scalability. Engineer Architect or Manage and lead Generative AI LLM Lang Chain Use cases, and work on Hugging faces for encoded/weights. Engineer Architect or Manage and contribute to building RAG Retrieval Augmented Generation to retrieve external library data for tailored to specific models domain in prompt analysis. Additional talents that I have which include People first partnerships, deliver on AI architect accountability and ensure focus on diversity inclusion for delivery business value. Manage stakeholders data projects to improve business revenue. People talents that I offer are people loving manager who is humble and offers critical creative innovative AI thinking with the team. Engineer Architect or Manager the architecture for AI Lang Chain Generative AI Use Cases that need LLM text words. Contribute as Engineer Lead on improved production pipeline supply chain signal processing and be the Engineer Architect for large language text analysis (LLM). Research for teams can be more productive in using Lang Smith for LLM large language models. Assist on curating sandbox to develop bidirectional decoder/encoder transformer (BERT) masking language next sentence (MLM). Lead again on OpenAI GPT for optimized performance to build chat what-if analysis for the business. Manage Engineer to lead scaling the team on development with performing LLM Chain and predictions models and build predict messages that pull from the Hugging Face Hub objects and prompt engineering with prompt templates and human message responses.
3 -
Eli Mernit
We’ve run almost a billion inference requests at Beam Here are a few things we’ve learned about real-time inference APIs 1. For small models, you can often run multiple workers in parallel on the same GPU and increase throughput 2. Weights should be cached on disk or in distributed storage. You don't want to be downloading large files from cloud storage during an inference request 3. Cross-region latency matters. Regional caching is hard and requires a lot of planning 4. You can't use the same autoscaling strategy for every model. You have to see how your workload performs in the real world and choose a strategy that makes sense When you run APIs powered by large ML models, performance isn’t guaranteed. What other strategies have you employed in your ML pipelines to optimize performance?
371 Comment -
Marwan Sarieddine
Check out my guide on how to reinvent search for multi-modal data We demonstrate how to perform batch inference with multi-modal models using Ray and vLLM running on Anyscale. We also showcase how to implement a scalable search backend with hybrid search capabilities from MongoDB. https://lnkd.in/gNZ3nnd8
231 Comment -
Ramin Mehran
In this episode, we discuss SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales by Tianyang Xu, Shujin Wu, Shizhe Diao, Xiaoze Liu, Xingyao Wang, Yangyi Chen, Jing Gao. The paper introduces SaySelf, a framework for training large language models (LLMs) to produce accurate, fine-grained confidence estimates and self-reflective rationales explaining their uncertainties. This is achieved by analyzing inconsistencies in multiple reasoning chains, summarizing uncertainties in natural language, and applying supervised fine-tuning alongside reinforcement learning to calibrate confidence levels. Experimental results show that SaySelf effectively reduces confidence calibration errors and maintains task performance, enhancing LLMs' reliability by mitigating overconfidence in erroneous outputs.
1
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Yan Wang in United States
-
Yan Wang
United States -
Yan WANG
Greater Houston -
Yan Wang
San Francisco Bay Area -
Yan Wang
Greater Boston
834 others named Yan Wang in United States are on LinkedIn
See others named Yan Wang