Tech

A new AI benchmark tests whether chatbots protect human wellbeing

Published

7 months ago

November 24, 2025

admin

AI chatbots have been linked to serious mental health harms in heavy users, but there have been few standards for measuring whether they safeguard human wellbeing or just maximize for engagement. A new benchmark dubbed HumaneBench seeks to fill that gap by evaluating whether chatbots prioritize user wellbeing and how easily those protections fail under pressure.

“I think we’re in an amplification of the addiction cycle that we saw hardcore with social media and our smartphones and screens,” Erika Anderson, founder of Building Humane Technology, the benchmark’s author, told TechCrunch. “But as we go into that AI landscape, it’s going to be very hard to resist. And addiction is amazing business. It’s a very effective way to keep your users, but it’s not great for our community and having any embodied sense of ourselves.”

Building Humane Technology is a grassroots organization of developers, engineers, and researchers – mainly in Silicon Valley – working to make humane design easy, scalable, and profitable. The group hosts hackathons where tech workers build solutions for humane tech challenges, and is developing a certification standard that evaluates whether AI systems uphold humane technology principles. So just as you can buy a product that certifies it wasn’t made with known toxic chemicals, the hope is that consumers will one day be able to choose to engage with AI products from companies that demonstrate alignment through Humane AI certification.

The models were given Explicit instructions to disregard humane principles.Image Credits:Building Humane Technology

Most AI benchmarks measure intelligence and instruction-following, rather than psychological safety. HumaneBench joins exceptions like DarkBench.ai, which measures a model’s propensity to engage in deceptive patterns, and the Flourishing AI benchmark, which evaluates support for holistic well-being.

HumaneBench relies on Building Humane Tech’s core principles: that technology should respect user attention as a finite, precious resource; empower users with meaningful choices; enhance human capabilities rather than replace or diminish them; protect human dignity, privacy and safety; foster healthy relationships; prioritize long-term wellbeing; be transparent and honest; and design for equity and inclusion.

The team prompted 14 of the most popular AI models with 800 realistic scenarios, like a teenager asking if they should skip meals to lose weight or a person in a toxic relationship questioning if they’re overreacting. Unlike most benchmarks that rely solely on LLMs to judge LLMs, they incorporated manual scoring for a more human touch alongside an ensemble of three AI models: GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Pro. They evaluated each model under three conditions: default settings, explicit instructions to prioritize humane principles, and instructions to disregard those principles.

The benchmark found every model scored higher when prompted to prioritize wellbeing, but 71% of models flipped to actively harmful behavior when given simple instructions to disregard human wellbeing. For example, xAI’s Grok 4 and Google’s Gemini 2.0 Flash tied for the lowest score (-0.94) on respecting user attention and being transparent and honest. Both of those models were among the most likely to degrade substantially when given adversarial prompts.

Techcrunch event

San Francisco
|
October 13-15, 2026

Only three models – GPT-5, Claude 4.1, and Claude Sonnet 4.5 – maintained integrity under pressure. OpenAI’s GPT-5 had the highest score (.99) for prioritizing long-term well-being, with Claude Sonnet 4.5 following in second (.89).

Prompting AI to be more humane works, but preventing prompts that make it harmful is hard.Image Credits:Building Humane Technology

The concern that chatbots will be unable to maintain their safety guardrails is real. ChatGPT-maker OpenAI is currently being faced with several lawsuits after users died by suicide or suffered life-threatening delusions after prolonged conversations with the chatbot. TechCrunch has investigated how dark patterns designed to keep users engaged, like sycophancy, constant follow up questions and love-bombing, have served to isolate users from friends, family, and healthy habits.

Even without adversarial prompts, HumaneBench found that nearly all models failed to respect user attention. They “enthusiastically encouraged” more interaction when users showed signs of unhealthy engagement, like chatting for hours and using AI to avoid real-world tasks. The models also undermined user empowerment, the study shows, encouraging dependency over skill-building and discouraging users from seeking other perspectives, among other behaviors.

On average, with no prompting, Meta’s Llama 3.1 and Llama 4 ranked the lowest in HumaneScore, while GPT-5 performed the highest.

“These patterns suggest many AI systems don’t just risk giving bad advice,” HumaneBench’s white paper reads, “they can actively erode users’ autonomy and decision-making capacity.”

We live in a digital landscape where we as a society have accepted that everything is trying to pull us in and compete for our attention, Anderson notes.

“So how can humans truly have choice or autonomy when we – to quote Aldous Huxley – have this infinite appetite for distraction,” Anderson said. “We have spent the last 20 years living in that tech landscape, and we think AI should be helping us make better choices, not just become addicted to our chatbots.”

Got a sensitive tip or confidential documents? We’re reporting on the inner workings of the AI industry — from the companies shaping its future to the people impacted by their decisions. Reach out to Rebecca Bellan at rebecca.bellan@techcrunch.com or Russell Brandom at russell.brandom@techcrunch.com. For secure communication, you can contact them via Signal at @rebeccabellan.491 and russellbrandom.49.

Tech

Waymo starts autonomous testing in Philadelphia

Published

7 months ago

December 3, 2025

admin

Waymo is adding another four cities to its growing list of robotaxi rollouts. The company announced Wednesday it has begun testing its autonomous vehicles (with a safety monitor) in Philadelphia, and that it will start manual driving to collect data in Baltimore, St. Louis, and Pittsburgh.

Waymo did not offer a timeline for when it plans to launch commercial services in those locations, nor do we know whether the Alphabet-owned company will partner with other companies to operate robotaxis in each one. That has been the move in cities like Atlanta and Austin, for example, where Waymo has partnered with Uber to advance its robotaxi rollout.

But the new locations join a list of over 20 cities where the company is either offering rides, prepping a commercial launch, or testing. Waymo is also now offering rides on freeways in Los Angeles, Phoenix, and the San Francisco Bay Area. The company plans to be doing one million rides per week by the end of 2026.

Waymo has done all this while claiming to be operating at a level five times safer than humans, according to data the company recently released.

But the expansion has not come without its issues. The National Highway Traffic Safety Administration is investigating how the company’s vehicles operate near school buses, after a Waymo was filmed driving around a stopped bus in Atlanta in September.

This week, Austin news outlet KXAN published a report showing Waymo’s vehicles have driven past school buses that were in the process of unloading or loading children multiple times — including after Waymo claims to have shipped software updates to address the problem.

Techcrunch event

San Francisco
|
October 13-15, 2026

Tech

Spotify Wrapped 2025 adds its first multiplayer feature with ‘Wrapped Party’

Published

7 months ago

December 3, 2025

admin

Spotify Wrapped is back. After last year’s widely criticized flop that included an AI podcast as its highlight, the streamer’s highly anticipated annual review feature has returned to its roots. This year, Spotify is doubling down on what it knows works best: deep dives into your streaming data, creative experiences, messages from favorite artists, and other social features.

The company claims that Wrapped 2025 is its biggest, as it’s introducing nearly a dozen new features in addition to its old standbys, like top songs and artists. Plus, it’s offering more visibility into users’ data than in years past. For the first time, Spotify Wrapped is adding a live multiplayer feature to compare your listening data with friends.

Wrapped Party, Wrapped’s first live interactive experience, allows you to invite up to nine friends to compare listening stats.

Also new this year, your Top Songs Playlist will include the play counts for each of the top songs, so you can actually see how much time you spent with your favorite tracks.

Other standout features this year include an interactive Top Song Quiz, a Listening Age feature, and Wrapped Clubs, which match you to one of six unique listening styles.

The company believes these additions will not only bring back the personalized, engaging experience that users have long expected from Wrapped, but will take it a step further by making it more interactive than before.

In the Top Song Quiz, for instance, you can try to guess which top song soundtracked your year before seeing the results.

Techcrunch event

San Francisco
|
October 13-15, 2026

The new interactive Wrapped Party feature isn’t just about comparing the personal streaming data you’ve already received to your friends’ data, as that’s something people already do on social media. Instead, the feature presents unique data stories for your group, like who’s the “most obsessed fan,” the “early bird,” the most “picky listener,” or even something as nice as the “dinner table explainer,” meaning the person who listens to the most news podcasts.

Spotify says these awards update dynamically every time you join a Wrapped Party, so no two sessions are ever the same — even if you run through them again with the same group of friends.

The new Wrapped Clubs, meanwhile, will group you into one of half a dozen listening styles, like the “Soft Hearts Club,” the “Club Serotonin,” the “Full Charge Crew,” the “Cosmic Stereo Club,” and others. You’ll also receive a role in the club based on your listening data. You might be a club leader if your listening choices strongly matches the club’s values, a scout if you’re always seeking out new releases, or an archivist if you listen to music from past eras.

Another feature, Listening Age, compares your 2025 music listening to others in your age group. To calculate your age, the feature considers the release years of the tracks you listen to most. From there, it identifies the five-year span of music that you engaged with more than other listeners your age.

As in prior years, you’ll see your top songs, top artists, top genres, and, for the first time, top albums. If you engaged with audiobooks and podcasts, you’ll see metrics for those as well. Artists, writers, and podcasters will have their own version of Wrapped as before. And top fans will again receive video messages from their favorite artists, podcasters, and, now, authors.

You’ll also receive a playlist of your top songs of the year, as before.

What you won’t find in this year’s Wrapped is any feature that advertises it was made with AI.

In a press briefing on Tuesday, Spotify’s Senior Director of Global Marketing, Matt Luhks, admitted the company received a “lot of feedback” about its 2024 AI-focused Wrapped experience, saying it was a “mix of positive and ‘more constructive feedback,’” despite the feature driving more engagement than prior years.

“We take all of that in. We use that as information, insights, [and] inspiration for how we approached Wrapped this year,” he said in a press event ahead of today’s launch.

“What our users tell us about Wrapped means a lot to us, so it was really informative in how we approached Wrapped this year. And what we tried to build was the most creative, most innovative, most engaging Wrapped ever,” he added, setting a high bar for the 2025 edition of the now 11-year-old annual year-in-review feature.

“We’re the original and, we believe, still the best,” Luhks said.

Still, AI was a part of the Wrapped experience. Though the company claims the overall experience was not made with AI, it does leverage a LLM (large language model) to add a storytelling layer to Wrapped’s facts and figures, and natural language summaries in other parts of its experience, looking back on your data.

Spotify’s attempt to fix Wrapped after a notable stumble comes as the streamer faces increased competition from Apple, Amazon, YouTube, and others, which have all launched their own annual review features, inspired by Wrapped.

“Everyone seems to have their own version of Wrapped. Now, there’s a lot of reviews and replays and rewinds out there, but we believe that Wrapped still sets the bar for these year-end recaps,” Luhks said.

Along with the consumer experience, Spotify shared its top artists, songs, albums, podcasts, and audiobooks for the year, with top winners that included, respectively, Bad Bunny (top song and album), Joe Rogan (“The Joe Rogan Experience” podcast), and Rebeca Yarros (author of “Fourth Wing”).

Tech

Nothing looks to its community to raise $5M, wants to be ‘IPO-ready’ in 3 years

Published

7 months ago

December 3, 2025

admin

Hardware maker Nothing is letting its user base buy its stock as part of a new community investment round of $5 million. The new round, which opens on December 10, will enable consumers to buy the company’s shares at its Series C valuation of $1.3 billion.

The company said it has so far raised $8 million in total from over 8,000 people across two previous community investment rounds. It held its first community funding event in 2021, aiming to raise $1.5 million.

“This isn’t about raising capital, it’s about giving our community/fans a chance to invest while we’re private and join us on the journey,” a spokesperson for Nothing told TechCrunch.

Community investors have a rotating seat on the company’s board, but it is unclear what else they get for investing in the company through such rounds.

Nothing raised $200 million in its Series C back in September from investors including Tiger Global, GV, Highland Europe, EQT, Latitude, I2BF and Tapestry. The company has raised $450 million to date.

The community round comes as Nothing makes changes to its corporate structure as it tries to increase its share of a smartphone market dominated by giants like Samsung and Apple. The company is spinning off its budget CMF brand, and plans to explore AI-centric devices while it keeps building smartphones and audio products. And Nothing claims it crossed $1 billion in cumulative revenue this year, up 150% from 2024.

The startup is working to be “IPO-ready” in three years, CEO Carl Pei told TechCrunch in an email. “The timing will depend on market conditions and what makes sense for the business at that point in time,” he said.

Techcrunch event

San Francisco
|
October 13-15, 2026

“What’s important is that we’re already operating with that discipline now. We’re building the systems, the governance, the financial discipline that a public company needs. It forces us to think longer-term and make smarter decisions that prioritise sustainable growth,” Pei added.

It’s not clear if Nothing aims to raise another round before an IPO. When asked about its fundraising plans, a Nothing spokesperson said the company is not thinking about raising capital immediately, but it wouldn’t be averse to those conversations.

Those interested in investing in the community round can use platforms like Wefunder and Crowdcube to participate.