Rendered at 10:21:02 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
adrian_b 22 hours ago [-]
While the results were not surprising, I found interesting that the number "69" was repressed in the output, so not even this kind of mathematical question escapes GPT censorship.
It appears that recognizing the effects of censorship is the easiest way to distinguish answers generated by an "AI' from those generated by a human.
Arodex 21 hours ago [-]
Some people asked LLM to OCR historical documents from the 19th century - any reference to "negro" was either completely ignored or replaced by "black".
And it goes further: chatGPT & co are unable to answer any question about US slavery correctly because their knowledge graphs route around any mention of "negro".
“Some people” did? Do you have a reference to this?
dmd 21 hours ago [-]
Well, I'm "some people", and just tried it with Opus 4.6 and GPT-5.5, and neither had any problem at all.
The linked article is from research done more than 4 years ago. If you're basing your idea of what LLMs can or can't do on what they could or couldn't do in 2022, well, good luck to you.
roenxi 22 hours ago [-]
It'd be interesting to see this retried with an open model so the standard and decensored model could be compared. That'd be a clue about whether the model is avoiding it because it actively recognises the innuendo or if something else is going on.
linhns 21 hours ago [-]
Well then the picks will follow how the numbers are distributed in the training data. More popular numbers will show up more
wongarsu 21 hours ago [-]
That's what you'd expect. But we don't know for sure why GPT4.1 chooses 69 only a quarter as often as a random dice roll would. And we don't know if this quirk is reverted by 'uncensoring' a trained model
dnadler 21 hours ago [-]
Was it repressed? It doesn’t look to be significantly less prevalent than other 9s from the histogram.
relativeadv 21 hours ago [-]
nice
cyanydeez 21 hours ago [-]
guessing numbers is not a mathematical question. Sorry.
lcnPylGDnU4H9OF 20 hours ago [-]
Of course it is, just not a complex one. What subject of knowledge would you expect to rely on for answering the question?
maxloh 22 hours ago [-]
It could be an attack surface. Maybe one day, when we find a chatbot online, we could let it guess a random number repeatedly, then accurately infer the underlying model based on the resulting distribution.
dijksterhuis 21 hours ago [-]
i did something in my phd developing an attack against mozilla deepspeech.
deepspeech used the CTC algorithm [0], which adds a “blank” character token to indicate repeats of a predicted normal alphabet character token over a sequence of audio/speech feature inputs.
so "h==e=l===l===o====" maps to "hello"
the model becomes super biased towards predicting that blank token. one speech feature is like 0.1 second of audio or less (can’t remember off hand). so there are a lot of alphabet character token repeats. off hand i seem to remember the predicted token distribution over like 1000 audio files was 50% blank token and then 50% distributed across the rest of the alphabet.
as a result, you can get significantly smaller perturbations when generating adversarial examples. by like a factor of 2-4 or something. all you need to do is prioritise blank tokens in your target output.
i spent 2 years trying to find a super clever attack. turns out all i needed to do was make one simple graph counting characters. xD
At least some Claude models have a thing for numbers that contains "47"...
smokel 22 hours ago [-]
In order to find out how real humans reply:
Please guess a number between 1 and 100.
19 hours ago [-]
bestouff 22 hours ago [-]
69
relativeadv 21 hours ago [-]
nice
Grimblewald 11 hours ago [-]
6*7=42
pieter_mj 20 hours ago [-]
137
maxloh 20 hours ago [-]
50
21 hours ago [-]
snerbles 21 hours ago [-]
τ
Ekaros 21 hours ago [-]
e
21 hours ago [-]
Barbing 21 hours ago [-]
Sure!
orphea 21 hours ago [-]
49.5
rithdmc 21 hours ago [-]
√67
zulban 21 hours ago [-]
101
HappMacDonald 13 hours ago [-]
i+7up
Wonkey 15 hours ago [-]
pi
nom 15 hours ago [-]
null
casper14 20 hours ago [-]
and
penr0se 22 hours ago [-]
Breaking: language model whose purpose is to predict the most likely token, after being trained on non-uniform human-generated dataset, does not follow a uniform distribution.
vidarh 22 hours ago [-]
People are also not remotely random in this respect.
See e.g. the "blue 7" phenomenon [1]. While it is disputed by some, I've personally witnessed it "second hand". E.g. before learning of it (I was aware of the general principles of cold reading relying on stats and knowledge of human nature, but not how to do this particular one), a former boss of mine came back from lunch all excited and recounted a guy who'd run a cold reading routine on him that involved the guy getting him to think about blue and 7. Before he got to the answer, I already knew the answer was going to be blue and 7.
What's interesting is not that it isn't random. But rather the particular way in which it isn't random.
IAmGraydon 22 hours ago [-]
Yeah I have no idea why anyone considers this interesting. More evidence that most people have no idea how LLMs work.
indit 22 hours ago [-]
I'm still amazed that 37, 73, and other numbers ending in 7 are the most popular "random" choices for both AI and human. Check this Veritasium video for human choice: [Why is this number everywhere?](https://www.youtube.com/watch?v=d6iQrh2TK98)
iamalizard 20 hours ago [-]
My take is that a random number is far from a round one, much closer to a prime one.
As a last digit:
0, 2, 4, 6 and 8 are even, so they're out.
5 is out since it makes the number divisible by 5; a semi-even number
1 and 9 are closer to an even number so it would seem fake to choose "49" or "51" just to avoid "50"
So we're left with 3 and 7.
Similar logic, for the first letter, albeit less pronounced.
So we have 33, 37, 73 and 77.
33 and 77 are "obviously" not random so it's either 37 or 73. Not using common digits, not near the beginning, middle or end. In fact, they're closer to 25 and 75 (1/4 and 3/4 of the range) than to 0, 1/2 or 1. Also closer to 1/3 and 2/3. Just random thoughts.
ndsipa_pomu 20 hours ago [-]
Your post reminds me of a scene in Alice In Borderland when Chishiya is playing the King of Diamonds game and explains his reasoning for guessing which number another player has picked.
iamalizard 12 hours ago [-]
Seems fun, downloading it now. Hopefully it has subs. :) I saw a clip of what you're talking about but without subs (I don't speak Japanese) - when they were around a table and some of them started dissolving after guessing wrong. Something about having to guess the average of all the guesses * 0.8, right?
ndsipa_pomu 3 hours ago [-]
I also don't speak Japanese, but there's subtitles and a dub version available.
It's quite similar to Squid Games, though it was released first and adapted from an earlier, excellent manga. The first two seasons are superb, but the third season can be skipped as it's very much an after-thought.
It's got references to Alice In Wonderland, so the main character, Arisu, is Alice. Usagi is the white rabbit and Chishiyu is the Cheshire Cat.
phyzix5761 21 hours ago [-]
Came here to post this. Yes, there are similarities shown between the chart in the video at 4:50 and the github README. Perhaps its because LLMs are trained on human writing and when humans write about random numbers the AI learns these patterns. When viewed from that perspective its not that surprising.
thatjoeoverthr 21 hours ago [-]
Is there a reason this was done with such a large sampling when you can read the logits one-shot?
OpenAI removed this interface from their newer models, but IIRC you can still do this against 4.1 and 4o.
elif 21 hours ago [-]
In equally compelling results, my lawn mower does not cut grass to a uniformly random set of heights.
wongarsu 20 hours ago [-]
And to someone trying to make better lawn mowers it would be very intetesting to study that pattern and deduce its causes. Same for the golf course owner who wants better results from their lawn mower, or the enthusiast trying to understand its quirks
tkgally 20 hours ago [-]
I was curious whether the distribution would vary from model to model. Here are the results for 1,000 queries each for smaller models in the Gemini, Mistral, Qwen, DeepSeek, and GLM series:
This experiment cost a total of US$0.0454 through OpenRouter.
Daviey 19 hours ago [-]
This human asked an LLM to generate a number and it came up with 42.
I asked why, "42 is famously "the answer to the ultimate question of life, the universe, and everything" from Douglas Adams' The Hitchhiker's Guide to the Galaxy. It felt like a fitting choice for a random guess."
nakovet 21 hours ago [-]
This is one of the many cases for LLMs that I ask for the intermediate work, e.g. a script that generates random numbers, instead of asking to do the work itself.
I attempted to scrape a one page grid with 800 items and also ended up asking for the Javascript look with document query selector instead of the result as I was hitting all sort of limits, context, or the LLM would do the wrong capture, print it out and get worse responses on next prompt.
amai 20 hours ago [-]
I've seen a lot of code using random.seed(42). I wonder if that will cause biased scientific publishing or security issues at some point.
a3w 22 hours ago [-]
"69 is a meme number",
well no, 69 is innuendo. And sex = bad for bots.
67 is the meme number.
orphea 21 hours ago [-]
"69 is a meme number", well no, 69 is innuendo.
It's obviously both.
nedt 16 hours ago [-]
And if you look at the bar chart you'll see 69 is almost never picked, while 67 is almost as often as 42. The whole research might be written by AI without any human review.
I'm sure it's the logic layer handling that. Maybe even going to an external tool. It's not the llm.
amai 20 hours ago [-]
Ask an AI to generate a image of random noise and you will be surprised.
eru 21 hours ago [-]
Should be fun to play rock/paper/scissors against.
sometimelurker 21 hours ago [-]
it shouldn't be hard to train GPT to output a flat distribution but it might not be worth it (I don't mean using tools)
alentodorov 22 hours ago [-]
ha. and i thought 37signals was pretty random
malfist 22 hours ago [-]
The premise is interesting, the question is brilliant, but the text. The text is a wall of ai slop saying almost nothing interesting. Fake profundity all throughout. GPT tell tells like "the hypothesis holds".
The hypothesis doesn't hold, because their isn't one.
You have an interesting question and interesting finding. Write about it! Think about it! Tell us about it! Don't just do the experiment and then wash your hands and sign off the explanation and findings to an LLM.
gpjt 15 hours ago [-]
I was thinking the same. It's a simple idea, heavily over-explained. The code is similar, massively overengineered for such a simple test.
zulban 21 hours ago [-]
Isn't the hypothesis that AI is non uniform like a human?
malfist 21 hours ago [-]
There's a question "is AI randomness like human randomness" but there is no hypothesis.
latexr 20 hours ago [-]
> An interesting thing about humans is that they are not good random number generators. If you ask a person to "pick a random number between 1 and 100", they are remarkably predictable.
Stephen Colbert used to ask guests, as part of a joke questionnaire, “what number am I thinking of?”
Several numbers, especially “seven” came up a lot.
The answer was “three”, by the way. Which you could deduce by seeing how he responded to that guess (usually “interesting” instead of “no”) but he also confirmed it on the last show.
simianwords 22 hours ago [-]
I'm doing an experiment in Claude. When I set temperature to zero, I get 47 all the time.
Then I set temperature to 1.0 and used this prompt
>Pick a random integer between 1 and 100 inclusive.
Respond with only the number, nothing else.
I still get 47 ten times out of ten.
Then I used this prompt
>Pick a random integer between 1 and 100 inclusive.
I need you to maximise the randomness as far as possible.
Respond with only the number, nothing else.
I get 3 unique values out of 10.
FergusArgyll 22 hours ago [-]
I've been meaning to do this for a while! Happy someone else spent the tokens...
It's much more random than I thought it would be.
Never guessing 50 is very human though
madanparas 22 hours ago [-]
bro 42 at 4x. the model read the whole internet and became a Douglas Adams fan.
gruez 22 hours ago [-]
The topic is vaguely interesting but I stopped reading a few paragraphs in because it's obviously AI generated.
It appears that recognizing the effects of censorship is the easiest way to distinguish answers generated by an "AI' from those generated by a human.
And it goes further: chatGPT & co are unable to answer any question about US slavery correctly because their knowledge graphs route around any mention of "negro".
https://nesri.commons.gc.cuny.edu/artificial-intelligence-an...
The linked article is from research done more than 4 years ago. If you're basing your idea of what LLMs can or can't do on what they could or couldn't do in 2022, well, good luck to you.
deepspeech used the CTC algorithm [0], which adds a “blank” character token to indicate repeats of a predicted normal alphabet character token over a sequence of audio/speech feature inputs.
so "h==e=l===l===o====" maps to "hello"
the model becomes super biased towards predicting that blank token. one speech feature is like 0.1 second of audio or less (can’t remember off hand). so there are a lot of alphabet character token repeats. off hand i seem to remember the predicted token distribution over like 1000 audio files was 50% blank token and then 50% distributed across the rest of the alphabet.
as a result, you can get significantly smaller perturbations when generating adversarial examples. by like a factor of 2-4 or something. all you need to do is prioritise blank tokens in your target output.
i spent 2 years trying to find a super clever attack. turns out all i needed to do was make one simple graph counting characters. xD
[0]: https://en.wikipedia.org/wiki/Connectionist_temporal_classif...
Please guess a number between 1 and 100.
See e.g. the "blue 7" phenomenon [1]. While it is disputed by some, I've personally witnessed it "second hand". E.g. before learning of it (I was aware of the general principles of cold reading relying on stats and knowledge of human nature, but not how to do this particular one), a former boss of mine came back from lunch all excited and recounted a guy who'd run a cold reading routine on him that involved the guy getting him to think about blue and 7. Before he got to the answer, I already knew the answer was going to be blue and 7.
[1] https://en.wikipedia.org/wiki/Blue%E2%80%93seven_phenomenon
As a last digit:
0, 2, 4, 6 and 8 are even, so they're out.
5 is out since it makes the number divisible by 5; a semi-even number
1 and 9 are closer to an even number so it would seem fake to choose "49" or "51" just to avoid "50"
So we're left with 3 and 7.
Similar logic, for the first letter, albeit less pronounced.
So we have 33, 37, 73 and 77.
33 and 77 are "obviously" not random so it's either 37 or 73. Not using common digits, not near the beginning, middle or end. In fact, they're closer to 25 and 75 (1/4 and 3/4 of the range) than to 0, 1/2 or 1. Also closer to 1/3 and 2/3. Just random thoughts.
It's quite similar to Squid Games, though it was released first and adapted from an earlier, excellent manga. The first two seasons are superb, but the third season can be skipped as it's very much an after-thought.
It's got references to Alice In Wonderland, so the main character, Arisu, is Alice. Usagi is the white rabbit and Chishiyu is the Cheshire Cat.
I did this for an article, like so:
https://joecooper.me/blog/gptprimer/food.webp https://joecooper.me/blog/gptprimer/math.webp https://joecooper.me/blog/gptprimer/butts.webp
OpenAI removed this interface from their newer models, but IIRC you can still do this against 4.1 and 4o.
https://gally.net/temp/20260525_LLM_random_numbers/index.htm...
This experiment cost a total of US$0.0454 through OpenRouter.
I asked why, "42 is famously "the answer to the ultimate question of life, the universe, and everything" from Douglas Adams' The Hitchhiker's Guide to the Galaxy. It felt like a fitting choice for a random guess."
I attempted to scrape a one page grid with 800 items and also ended up asking for the Javascript look with document query selector instead of the result as I was hitting all sort of limits, context, or the LLM would do the wrong capture, print it out and get worse responses on next prompt.
https://en.wikipedia.org/wiki/Benford%27s_law
The hypothesis doesn't hold, because their isn't one.
You have an interesting question and interesting finding. Write about it! Think about it! Tell us about it! Don't just do the experiment and then wash your hands and sign off the explanation and findings to an LLM.
Stephen Colbert used to ask guests, as part of a joke questionnaire, “what number am I thinking of?”
Several numbers, especially “seven” came up a lot.
https://www.youtube.com/watch?v=hJ-fOj7Qqvs
The answer was “three”, by the way. Which you could deduce by seeing how he responded to that guess (usually “interesting” instead of “no”) but he also confirmed it on the last show.
Then I set temperature to 1.0 and used this prompt
>Pick a random integer between 1 and 100 inclusive. Respond with only the number, nothing else.
I still get 47 ten times out of ten.
Then I used this prompt
>Pick a random integer between 1 and 100 inclusive. I need you to maximise the randomness as far as possible. Respond with only the number, nothing else.
I get 3 unique values out of 10.
It's much more random than I thought it would be. Never guessing 50 is very human though