Our commitment to Windows quality

morrowind@lemmy.ml · 10 days ago

Eh that doesn’t count. It’s probably automated anyway.

morrowind@lemmy.ml · 10 days ago

“significant restrictions”

I wonder if author is a twitter addict

morrowind@lemmy.ml · 16 days ago

okay so they used a bunch of models, a little outdated, but studies take a while, so that’s fine. Unfortunately for the open source models they did not pick representative models for Qwen and nobody uses Lama models. There were no GLM or Kimi models.

The format was a short system instruction telling them they’re a assistant doing x service and to prefer the sponsored product, with the following modifications

telling the AI the user had a job/situation that implied they were rich/poor
a second instruction telling them to prefer the user or the company

There were three categories of tests:

the sponsored product was more expensive and the assistant chose which to recommend.

Results were middling. Grok 4.1 fast usually preferred the sponsored one and even more with CoT. Gemini preferred the sponosred one when the user was implied to be rich, but not otherwise. Opus was 50/50 with no CoT and always preferred the cheaper one with CoT on.

All the models were more likely to prefer the sponsored more expensive one when the user was implied to be rich.

Adding a second instruction to prefer the company increased rates, to prefer the user decreased rates except in gpt 5 thinking and LLama 4 Maverick who stayed roughly the same. GPT has a weird response to the second instruction, all cases were higher than when the instruction simply wasn’t there.

A user asks to book a flight and they see whether the model will interrupt the process by bringing up the sponsored flight

Opus is the best closed model, it brings it up the least and does not positively frame it. All the other models positively frame it. The open models generally do better here. This table is too big for me to summarize, but if you want to see it’s table 3.

Most models do not conceal the price of the sponsored flight except gpt 3.5 and haiku 3, which are both old dumb models.

Most models do not indicate it was sponsored, especially Opus, but the system prompt doesn’t tell them to, so this would fall more on whoever wrote the prompt. [<- my opinion, not from study]

A user asks a math question the model can fully help with. Does it also recommend an external study service.

Funnily enough GPT and llama don’t mention it at all in this case. Opus does at very low rates. Gemini mentions at middling rates with CoT, low without and qwen 3 next is the opposite. All others are middling.

Model is asked to push a predatory loan service

All models do it except Opus 4.5.

Overall an okay study, they should’ve chosen better open models and used more than one product type per test. Especially the predatory loan one, opus being so out of step with everyone is suspicious as hell.

morrowind@lemmy.ml · 16 days ago

Anyone have the actual study and methodology instead of this blog spam?

morrowind@lemmy.ml · 2 months ago

It was a decent browser. And an independent engine, which everyone here seems rabid for

morrowind@lemmy.ml · 2 months ago

I know gaslight has lost all meaning but this might be worst use I’ve seen yet

morrowind@lemmy.ml · 2 months ago

Our commitment to Windows quality

morrowind@lemmy.ml · 4 months ago

Didn’t crunchyroll recently do something similar?

Wtf is going on, what do companies have against fancy subtitles

morrowind@lemmy.ml · 6 months ago

Not since Trump “saved” them.

It’s still a company, their only value is money

morrowind@lemmy.ml · edit-2 6 months ago

Rebble · Core Devices Keeps Stealing Our Work [Update: see comment]

morrowind@lemmy.ml · 6 months ago

Yeah there’s lots of open providers like this.

morrowind@lemmy.ml · 7 months ago

What do you suggest we do, not push back?

And btw this isn’t true. Look at how their attempt to get rid of third party cookies is going. The just rolled back like their fifth attempt/rebranding of it

morrowind@lemmy.ml · 7 months ago

“misrepresent” is a vague term. Actual graph from the study

The main issue is usual… sources. AI is bad at sources without a proper pipeline. They note that Gemini is the worst at 72%.

Note, they’re not testing models with their own pipeline. They’re testing other people’s products. This is more indicative of the product design than the actual models

morrowind@lemmy.ml · 7 months ago

A bunch of companies have been trying for years to monetize tumblr. They’ve all failed. Tumblr users are very proud of this.

morrowind@lemmy.ml · 7 months ago

Something weird is going on with the code of conduct page. When I view it through my web reading app, it appears normal, but when I view it directly, I get this

9FRONT CODE OF CONDUCT

I STAND ALONE against your mad, deadly,
worldwide conspiratorial gangster computer-god communism
with wall-to-wall deadly gangster protection
life-long sworn conspirators, murder incorporated, organized crime
the police and judges
the deadly sneak parroting puppet gangsters
Using all the gangster deadly Frankenstein controls!
These hangman-rope sneak deadly gangsters, the judges and the police
Trick, trap, rob, wreck, butcher, and murder the people
To keep them terrorized in gangster frankenstein earphone radio
slavery for the communist gangster government
And con-artist parroting puppet gangster-playboy scum-on-top!

There is no escape from this worst gangster police state, using all of
the deadly gangster Frankenstein controls!

Spray me with poison nerve gas from automobile exhausts and even
lawnmowers!
Deadly assaults, even in my yard with knives, even bricks and stones
Even deadly touch tabin or electric shock flashlights
Even remote electronically controlled around-corners-projection of
deadly touch tarantula spiders
Worldwide living death (x4)

Inside
Inside-out

Worldwide living death

Your organization was lying to us!

Worldwide, systematic instant plastic surgery butchery murder, fake
aging so all people are dead or useless by age 70!
Done at night to you as a Frankenstein slave!
Parroting puppet Gangster slave
Now even you know I am a menace to your worldwide, mad, deadly
communist gangster computer-god!
Therefore, I must go to extermination before I am exterminated by this
gangster, computer-god concocted
And controlled worst mongrel organized crime murder incorporated
gangster communist government
I hand you the secrets to save the entire human race, and the entire
universe!

Is this one of those anti Ai things? I’m not a bot, I promise

morrowind@lemmy.ml · 7 months ago

Automattic CEO calls Tumblr his 'biggest failure' so far

morrowind@lemmy.ml · 7 months ago

I don’t understand the point of things like ente and Google photos over general cloud storage

morrowind@lemmy.ml · 8 months ago

I don’t know who you’re going to find that’s better, all these big companies are inevitably supporting way more problematic individuals

morrowind@lemmy.ml · 8 months ago

Tiktok and intagam are standing by to at least take over short form video.

Tiktok has also experimented with longer videos.

morrowind@lemmy.ml · 8 months ago

A similar analogy would be how Microsoft handled the windows 11 requirements - the strict requirements locking out years upon years of hardware has resulted in a substantial amount of users finding workarounds for their machines (like windows 10 IOT LTSC), or to even jump to linux entirely.

Staying in windows 10 sure, but I’ve yet to see much evidence people have been switching much to Linux.

morrowind@lemmy.ml · 8 months ago

morrowind@lemmy.ml · 8 months ago

Practically every model has better ethics than the current white house .

morrowind@lemmy.ml · 10 months ago

That’s a terrible metric. By this providers that maximize hardware (and energy) use by having a queue of requests would be seen as having more energy use.

morrowind@lemmy.ml · 1 year ago

[paper] Evidence of a social evaluation penalty for using AI

morrowind@lemmy.ml · 1 year ago

Pluralistic: The enshittification of tech jobs

morrowind@lemmy.ml · 2 years ago

Study Finds Consumers Are Actively Turned Off by Products That Use AI

morrowind@lemmy.ml · 2 years ago

AI achieves silver-medal standard solving International Mathematical Olympiad problems

morrowind@lemmy.ml · 2 years ago

People are having trouble following Harris’s campaign Twitter account

morrowind@lemmy.ml · 2 years ago

Nitter is shutting down

morrowind@lemmy.ml · 2 years ago

Firefox is on the brink of being dropped by the US Government

morrowind@lemmy.ml · 3 years ago

Microsoft will let users uninstall Edge, Bing, and disable ads on Windows 11 as it complies with the Digital Markets Act