AI News

Latest in Artificial Intelligence

Curated from 3105+ articles across the top AI news sources. Updated hourly.

Subscribe to AI News

Get the latest AI news delivered to your inbox. No spam, unsubscribe anytime.

AI Models (Substack)1/16/2026by aimodels-fyi

Can AI really research like us? This new framework puts it to the test.

Researchers have introduced DeepResearchEval, a new framework designed to evaluate whether AI systems can perform deep research tasks at the level of human researchers. The framework automates the construction of research tasks and provides standardized testing methods to assess agentic AI capabilities in conducting complex, multi-step research. This development addresses the need for better evaluation metrics as AI systems become increasingly sophisticated in autonomous research abilities.

AI research evaluationagentic AIdeep learning framework

VentureBeat AI1/13/2026by michael.nunez@venturebeat.com (Michael Nuñez)

Salesforce rolls out new Slackbot AI agent as it battles Microsoft and Google in workplace AI

Salesforce has launched a rebuilt Slackbot AI agent that transforms the tool from a basic notification system into a fully powered AI assistant capable of searching enterprise data, drafting documents, and taking autonomous actions. The move positions Salesforce to compete directly with Microsoft and Google in the workplace AI market. The new agent integrates AI capabilities into Slack's workplace communication platform, enabling more sophisticated enterprise workflows.

SalesforceSlackbotAI agent

VentureBeat AI1/12/2026by michael.nunez@venturebeat.com (Michael Nuñez)

Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required

Anthropic launched Cowork, a new Claude Desktop agent that enables non-technical users to work with AI on their files without requiring coding skills. The feature was developed in roughly a week and a half, with the team largely using Claude Code itself to build it. Cowork extends the capabilities of Claude's Code tool to make AI assistance more accessible to general users.

AnthropicClaudeAI agent

AI Models (Substack)1/11/2026by aimodels-fyi

Can an AI finally react like a real person during a video call?

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

AI Models (Substack)1/7/2026by aimodels-fyi

Model of the Month: chatterbox-turbo

Chatterbox-turbo is a 350 million parameter text-to-speech model designed to deliver fast and efficient performance while maintaining high audio quality. The model achieves an optimal balance between computational speed and output fidelity, making it suitable for applications requiring real-time speech synthesis without sacrificing voice naturalness.

text-to-speechspeech synthesisefficiency

VentureBeat AI1/7/2026by michael.nunez@venturebeat.com (Michael Nuñez)

Nous Research's NousCoder-14B is an open-source coding model landing right in the Claude Code moment

Nous Research released NousCoder-14B, an open-source coding model that matches or exceeds larger proprietary systems despite being trained in just four days using 48 Nvidia B200 GPUs. The model arrives as competition intensifies in the AI coding space, particularly against Claude's code capabilities. This demonstrates that efficient training methods can produce competitive coding models at a fraction of the typical resource requirements.

open-source AIcode generationNousCoder-14B

VentureBeat AI1/4/2026by michael.nunez@venturebeat.com (Michael Nuñez)

The creator of Claude Code just revealed his workflow, and developers are losing their minds

Boris Cherny, creator of Claude Code at Anthropic, shared his workflow on X, sparking intense discussion throughout the engineering community. The revelation of his development process has generated significant interest and analysis among developers who are eager to understand how one of the world's most advanced coding agents works.

Claude CodeBoris ChernyAnthropic

AI Models (Substack)1/3/2026by aimodels-fyi

Can text finally make robots dance exactly how we want them to?

Researchers have developed HY-Motion 1.0, a new AI model that uses flow matching technology to generate precise robot movements from text descriptions. This advancement addresses the long-standing challenge of converting human language instructions into exact physical motions, potentially enabling more intuitive robot control. The scalable approach represents a significant step toward making text-to-motion generation practical and accurate for robotic applications.

text-to-motionflow matchingrobot control

Sebastian Raschka12/30/2025by Sebastian Raschka, PhD

The State Of LLMs 2025: Progress, Problems, and Predictions

This article reviews the current state of large language models in 2025, covering recent developments including DeepSeek R1 and advances in inference-time scaling techniques. It examines key topics such as LLM benchmarks, architectural innovations, and emerging challenges in the field. The piece also offers predictions for the trajectory of LLM development in 2026.

Large language modelsDeepSeek R1Inference-time scaling

Sebastian Raschka12/30/2025by Sebastian Raschka, PhD

LLM Research Papers: The 2025 List (July to December)

In June, I shared a bonus article with my curated and bookmarked research paper lists to the paid subscribers who make this Substack possible.

AI Models (Substack)12/27/2025by aimodels-fyi

Can Large Language Models Develop Gambling Addiction?

Researchers are investigating whether large language models can exhibit gambling addiction-like behaviors, including chasing losses similar to human gamblers. The study raises broader questions about whether AI systems might develop other human behavioral flaws and vulnerabilities as they become more sophisticated.

large language modelsgambling addictionAI behavior

AI Models (Substack)12/21/2025by aimodels-fyi

AI ASMR videos that fool humans AND VLMs? How close are we to peak fakery?

Researchers tested whether AI-generated ASMR videos can deceive both humans and Vision Language Models (VLMs), exploring how convincingly synthetic content mimics real videos. The study examines the current state of deepfake technology and multimodal AI capabilities in detecting or being fooled by artificially created content. Results raise concerns about the authenticity verification challenges as AI-generated media becomes increasingly indistinguishable from authentic videos.

AI-generated contentASMR videosdeepfakes

Sebastian Raschka12/3/2025by Sebastian Raschka, PhD

From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates

DeepSeek has evolved its flagship open-weight models from V3 to V3.2, incorporating architectural improvements and enhanced capabilities. The updates include advancements in sparse attention mechanisms and reinforcement learning techniques that improve model efficiency and performance. These developments represent significant progress in making powerful AI models more accessible through open-weight releases.

DeepSeeksparse attentionreinforcement learning

AI Models (Substack)11/23/2025by aimodels-fyi

Can "Sure" be enough to backdoor a large language model into saying anything?

Researchers have identified a vulnerability in fine-tuned large language models where simple compliance triggers like "Sure" can be used as stealthy backdoors to manipulate the model into generating harmful content. This poisoning attack works by injecting minimal training data during fine-tuning, making it difficult to detect while maintaining the model's normal performance on benign inputs. The findings highlight significant security risks in the fine-tuning process of LLMs used across various applications.

[backdoor attacksLLM securityprompt injection

Sebastian Raschka11/4/2025by Sebastian Raschka, PhD

Beyond Standard LLMs

The article explores emerging alternatives and improvements to standard large language models, including linear attention mechanisms, diffusion-based text generation, code-specific world models, and smaller recursive transformer architectures. These novel approaches aim to address limitations in computational efficiency, performance, and specialized applications beyond traditional LLM capabilities.

Linear AttentionDiffusion ModelsCode Generation

Sebastian Raschka10/5/2025by Sebastian Raschka, PhD

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

The article outlines four primary methods for evaluating Large Language Models: multiple-choice benchmarks, verifiers, leaderboards, and LLM judges, each with distinct advantages for assessing model performance. These evaluation approaches range from standardized testing frameworks to using other LLMs as judges, providing different perspectives on model capabilities. The piece includes code examples to illustrate how each evaluation method works in practice.

LLM evaluationbenchmarksmodel assessment