by @punk6529
- You can notice changes in your visual environment in about 12 to 20 milliseconds (1/1,000th of a second). It is why monitors that refresh 60x/second are pretty smooth and by 90-120x/second, they are in good shape
- Vision feels like a continuous process but there is no such thing. It is discrete, just like everything else. The rods and cones in your eyes do not respond to changes in light instantly. They respond to changes in light very quickly. It feels instant but it is not.
- The same thing applies for the electrical signals in your brain that transport the information from your eyes. They move very very quickly but not infinitely quickly. So if something changes in your field of vision (a light comes on, an intruder enters)…
- …then you notice very quickly, say in 15 ms (milliseconds) but not infinitely quickly. There is, as we like to say in computing, some latency. Now something has changed in your field of vision - how long does it take for your brain to react?
- It takes from 150 ms for simple unconscious reactions to 350 to 400 ms for conscious decisions that require choices. This is not a law of physics. It is a biological choice. Cats - famously - have “cat-like reflexes” which are about 2x to 3x as fast as humans.
- So you can detect 60 changes per second in your field of vision but you can only react to 2 to 4 per second at best. So the raw input to your brain is very fast, but the processing of what it means and how to respond is an order of magnitude slower.
- You see a flame in 15ms. If you react unconsciously to move your hand, it might take you 150ms. If you are juggling pots and pans and get into conscious processing of what to do or which one to drop, it might take 300-400ms.
- Let’s go a step up the abstraction ladder. You are talking to your friend. We are now in the world of seconds, not milliseconds. People speak at about 150 to 200 words per minute. Sentences are 10 to 20 words, so might take 5 seconds (5000 ms) to say a sentence.
- Humans tend to speak in turns of about 15 seconds so a conversation turns 4 times per minute, conveying 30 to 50 words in each turn. So, in total, about 150 to 200 words can be conveyed per minute in “audio mode”
- Reading is faster. Most people read at 250 to 300 words per minute. Fast readers can read at 400 words per minutes and a very few fast readers can go 500 to over 1,000 (but this is very rare). Let’s use 300 for a high-end estimate of normal
- Writing though is much slower. Most people write at around 40 words per minute (this is not pure typing speed but includes the time to consider what you are writing). Of course, this is a very good case where you are not trying all that hard
- If you want to craft a perfect email, you may spend 1 hour on a 300 word email, writing, reading, deleting and rewriting. In this case, your writing output is in the range of 6 words per minute.
- Then there is thinking about hard problems. There is no limit to how long it takes. You could spend hours, days, weeks, months lost in thought. If you are very special maybe “gravity” comes out of that, if you are less special maybe “new slogan”
- OK SO WHAT THIS IS VERY BORING. So in theory I knew all the things above 1 year ago too. But a year of heavy GPT use has radically changed how I think about them and has freaked me out a bit on multiple dimensions
- The core thing that has happened is that spending so much time looking at the prompt / latency / response cycle on GPT over the last years has eventually tipped me over to “seeing” my own stimulus / latency / response cycle
- It is not easy to see, you have to focus on it, but if you focus on it, you can “see it” You can see with difficulty for pure vision - think hard when you are driving about if your actions are continuous or discrete and how often you are taking a discrete action.
- You can see it easily in talking, reading and writing. We have stimulus / latency (computation) / output limits. I kindof had figured it out here in February, but it was still a bit fuzzy to me them
- What I am trying to say is this. We had previously said “life is short” and this means the days and weeks and years go fast. But now I am stuck on “life is short” because there is some type of absolute limit on how many things I can ever think.
- I am going to repeat it again because it is important. If you assume some finite lifespan, there is a hard and calculable limit to how many things you will ever think or feel. You are not a continuous stream of consciousness, but instead a series of discrete prompts.
- You can see what happened to me. When you have an API account on OpenAI your account rate limits are measured in requests per minute and tokens per minute and at first you are annoyed until you hit higher tiers. Let’s look at Tier 4 though
- You get 10,000 requests per minute or 166 requests per second or 6 ms per request. You get 800K tokens (600K wpm) which is, well, about 3,000x faster than you can talk. And of course these limits are arbitrary. @OpenAI’s total system capacity is vastly higher.
- Also, of course, you can hit the OI API all day long every day. You cannot be talking or reading or reacting 24/7. You need to sleep, rest and so on. So what has happened to me is I look at the OI API a lot, I see how we can use it, I think about the limits and then….
- …I end up at “omg, wtf, my own rpm and tpm is tragic” In extreme circumstances, I can handle 120 requests per minute and maybe 1K to 1.5K tokens per minute if I am reading ultra fast (I am a very fast reader) But obviously these are rookie numbers relative to the above.
- Actually I am (and you are) a multimodal model with different limits per mode
Vision reaction (no words): 180 rpm. 0 tokens
Talking mode: 4 rpm. 300 tokens/minute
Reading mode: 1K tokens/minute.
Writing mode: 5 to 100 tokens/ minute - Speaking or reading non-stop for 8 hours per day is hard, but let’s pretend we do it for a decade.
Speaking: 525M tokens / decade
Reading: 1B+ tokens / decade (if you are a fast reader)
Requests (thoughts): Far fewer - There are hard and calculable limits to the above. There is a finite number of things you will think, read or say in the next decade. Of course, stated like that it is obvious, but emotionally it is very different when comparing it to GPT.
- Here is another way to look at it. GPT-4o - the current flagship model costs $5/1M input tokens and $15/1M output tokens. Let’s average to $10. That means all your speaking or reading for a decade will cost $5K to $10K worth of tokens
- Here is another way to look at it. GPT-4o - the current flagship model costs $5/1M input tokens and $15/1M output tokens. Let’s average to $10. That means all your speaking or reading for a decade will cost $5K to $10K worth of tokens
- Now today those GPT-4o tokens are of lower quality that most people’s best self-generated tokens. But they are better than most people’s worst self-generated tokens. But LLMs will get cheaper and better and faster and some day they will be producing tokens similar to us
- And then we will have a pretty direct comparison and be able to say “huh, 6529’s input and output for the whole year of 2030” was like $69.42 worth of calls to GPT-9 (legacy deprecated edition model). It is so so so so weird when you think about it too hard.
- You will cope at this stage on behalf of humanity. My tokens are better, my feelings are more special, GPT-9 could never express the range of emotions that I do and so on. And maybe that is true or maybe it isn’t but it does not change anything important.
- The most important thing is that you have a token budget for your own brain. Use it well. I worry about being wasteful about the OpenAI’s API budget but historically I have been not conscious about being wasteful of MY token budget.
- Now we have handy good practices like “life is to short to spend with jerks” and or “look at the bigger picture” and so on, but now I have a generalizable framework I have tokens, they are running down, I want to use each and every request and token in the best way possible
- Is the “best way possible” tweeting tweetstorms or is it spraying champagne on your friends? That is for you to decide. Just know you are using your lifetime budget of tokens of consciousness, of thought, of requests, of inputs and of outputs
- What is almost certainly true though is “don’t waste them on bullshit” Things you don’t want to do. Angry typing to someone you are upset at. All of these are “uses of tokens you are paying for with your actual life and probably you will regret”
- What is also true is that if you don’t control yourself, you are letting others use your token budget. someone says something mean to you and you get upset and think sad things or write mad things or talk to your friend upset about it all. what is this in this framework?
- This is like you gave your API key to someone else and are letting them start using up the tokens in your brain for free and for things you don’t want to use them on. If someone did this to your OpenAI account, you would rotate keys immediately
- Yet people think it is very normal to get into fights with people, get upset by what they say, get distracted by their prompting. Protect your token budget! Do not use it on your enemies. Use it for you, for good, for the things that make you better and happier.
- And what about the LLMs themselves? How do we think about those? The general topic is not for today but in this specific context, a good use for LLMs will be to deal with the tokens you don’t want to deal with. So much of life is kindof bullshit tokens
- We will be able to quite inexpensively move those to the LLMs in the next few years and leave more of our token budget for ourselves What will we do with it then? Waste it on PageSix? Do great things? Just chill. I dunno, but I think we will find out soon
- I am going to pause for now, leave the thread open for a bit longer, but I encourage you to look for your prompt / computer / response cycle. It is there if you look hard for it and you will be a different person once you see it.