skip to content
← Go back

Self Mutating AI: The Golden Key To Unlocking General Intelligence

Self Mutating AI: The Golden Key To Unlocking General Intelligence

Self-mutation is the core of flexible adaptaion found in the neural plastic human brain. However, what is really needed to achieve true autonomous AI? The advancements of AI has been tremendous in 2023-2024 which have brought the question of when will AGI exist? This article is aimed to be a thought provoking piece to illicit you to think about new possibilities for the coming generations of humanity.

import MailingList from ”@/components/blog/NewsletterForm”;

Morphogenesis: is the biological process that causes a cell, tissue or organism to develop its shape. It is one of three fundamental aspects of developmental biology along with the control of tissue growth and patterning of cellular differentiation.

when we’re trying to solve a math problem we sit there and think of how we solved it before and try to think of cues to access memories — this is kind of a desperate attempt to try random mapping retrieval until we hit something or give up.

  • chaos theory: chaotic deterministic, aperiodic behaviour over time (no observable pattern) and extremely sensisitive to small changes in the initial conditions. One of the most important tasks in CT is to find out after what that period of time is it simply not worth it to predict the future?

  • what would it look like to learn a new concept? connect it abstractly to other concepts? distinguish and sort categories of things and form relationships? when the OS command is executed what exactly is the sensory feedback that creates the experience? how do turn that sensory feedback into something meaningful? how is that meaningfulness turned into a neuron?


Create the frame for something to develop anything. Create a cell is auto-generative and auto-healing. Create a model that creates it’s own model — mothercell that creates any cell in the human body. Study evolution. Need abstraction and generalisation. How did the first cell have the chance to survive the rules of environment into the cell. Write down exactly what your after. Abstract all the laws into a single entity that needs to mutate, even the mutation process needs to mutate too. the way a cell divides is not the same. the damaged DNA creates another strand of DNA which created intelligence. The cell existed bc the entity that created the cell knew the environment was complex. The wave of an ocean is a function of the ocean. the environment is a physical representation of functions. we borrow intelligence from the environment. must be aware theres an computation from the external world and it must adapat to mutate, and even the process. ocean takes (wind and water) to create wave. you need self-awareness to borrow. borrowing computation from the environment if you’re aware of the computation because its not yours. you become the system. mimicry is using your voice, system and brain to say what your.

  • all stem cells are mother cells; not all mother cells are stem cells
  • evolution involves genes and dna.
  • all the different organisms that lived and perished
  • the cell is premade for the environment. the cell doesnt contain the rules. it is engineered to work well in that environment. that doesnt mean you have all the rules in the env.
  • study envolution; how many cells existed, the ones that survived, why? the ones that didn’t, why. what were they lacking (generalised and abstraction rules were missing were the cause of death)? what do all the cells look like at the beginning. what did the suriving cell mutate into to become every organism on earth today. You find dead cells become fossil fuels. The ones that survived are todays living organisms.
  • evolution of cells that survived. you only get to see the ones that lived. ones that die are perishing.
  • what is intelligence?
  • read about generalisation (generalise things) and abstraction (abstracting things).
  • dont need quatum physics. create unifined formula that describes the entire world: how extremely smart people abstract rules into a generalised equation. then you turn this into a virtual mother cell. academic papers. how did they get to that formula?
  • mathematical modeling


a machine doesnt have goals because it doesnt feel the risk of failing and can do it a billion times. a human goal is a task, emotion is a modifier. sometimes emotion impairs our goals.

  • what is self-awareness? without self-awareness there would be just instinct from DNA.

  • what is intrinic value assignment? e.g. i value math instead of history - its more interesting. truth seeking.

  • What is imagination?

  • What is creativity?

  • How do you come up with new ideas?

  • What creates abstract thought: high level concept by removing the low level details? How do you assign the low level details to it?

  • What happens if we remove the hippocampus and other memory formation parts of the brain?

  • How does memory become implicit or explicit?

  • Is self-awareness more important than emotions? If you’re starving with no emotions you still are aware of starving so you eat food because you’re aware. Or leader of a tribe to conquere land, without emotion. If you dont do the thing the village turns against you: protection, food, social status. Emotions must make things stronger.

  • How are beliefs formed?

  • How important are emotions when it comes to creating goals? Are they solely responsible for motivation? What if we had no emotions (amygdala)?

  • If we remove the amygdala, would we be able to delay gratification permanently to make only objective decisions?

  • Is the amygdala vitally important for making decisions or is it merely a flaw in human decision making (emotional decisions)?

  • How do you priorize one goal or action over another? What if there is an unknown reward?

  • How do you decide what to focus attention on?

  • Why do things compete for our attention when we want to focus on a single thing?

  • In a vacuum how do you decide what to start working towards? The desire for a feeling or extinguishing a feeling?

  • How do you determine relevancy of actions to achieve a goal?

  • How do you create relationships between objects?

  • How does the substatia nigra (main area for dopamine) choose how much to release and when?

  • How are inhibitory and excititory neurotransmitters

  • How do they play a role in learning

  • Is dopamine the only neurotrasmitter responsible for future planning?

  • How does serotonin play a role in motivation?

  • What are neuromodulatrors?

  • How do they play a pay in causing neural plasticity?

  • When neural plasticity occurs how do connecting neurons not get screwed up when a pathway is changed?

  • what is language?

  • how do you assign language to a concept? mapping?

  • babies associated mean and nice to peoples’ intentions before they even know the concept of those words. What kind of language can be attributed to this categorisation?

  • how do you create associativity relationships that are directly relatable? what about for abstract relationships, e.g. analogies?

  • what

Neuron Death

There are two ways cells die. If they dont get enough nutrients (e.g. artery blocks) then the cells die a sloppy sort of death in which inflammatory chemicals leak out and cause damage to the local neighbourhood. The second way they die is by apoptosis, in which they neatly commit suicide. They purposelly fold up shop, take care of their affairs and consume themselves. Apoptotic cell death is not a bad thing. In fact, it is the engine for sculpting a nervous system In embryonic development, the trajectory from a webbed hand to clearly defined fingers depends on sculpting away cells, not adding them. The same principles apply to sculpting the brain. During development, 50% more neurons than needed are produced. Massive die-off is standard operating procedure. In cancer, a cell mutates and starts competiting with other cells. However, once this new mutant cell has replicated, its own cells are no wfighting with themselves. So further mutations can occur giving a new advantage making the new progeny just slightly better competitiors. They continue to fight, evolve and be better fighters and eventually the tumor kills the host.

Artificial neural networks suffer from the stability/plasticity problem against the invasion of other memories. The brain circumvents this dilemma somehow by locking down older memores, not just stengthening and weakening synapses. The first solution is to make sure the whole system isn’t changing at once. Flexibility should turn on and off only in small spots, steered by relevance. With neuromodulators, the brain can control the plasticity of synapses only at the appropriate place and time, instead of each time activity passes through the network — when something important is happening. Experiences turn into memories when they are germane to the life of the organism and especially when connected to a high emotional state such as fear or pleasure. This reduces the chances of overwhelming a network, because not everything gets written down. But this doesnt solve the problem, because tehre are still plenty of salient memories to worry about storing. So teh brain has a second solution, it doesnt always hold memories in one place. Instead, it passes what is has learned to another area for a more permanent storage. The hippocampus’ role in learning is temporary, its not the site of permanent storage. It is uesed as the formation of new memories. Instead it passes along the learning to parts of the cortex, which holds the memory more permanently. Synaptic changes are consequences of memory storage rather than the root mechanism. A stream of new neurons are born in the hippocampus and tucking their way into the adult cortex. E.g. if you train a rat on learning a task that requires the hippocampus, the number of new adult generated neurons doubles from baseline. In contracts, if you train rats on a learning task that doesnt require the hippocampus the number o new cells goes ulaltered.

When the brain changes via neural plasticity many parameters change including synapses, channel types, channel distributions, phosphorylation states, the shapes of neurites, the rate of ion transportation, nitric oxide production rates, biochemical cascades, spatial arrangements of enzymes and gene expression.

Memory is a function of everything that has come before it. For exaple, one person might encode the flying of a helicopter on top of motor memories of steering a steed and another with a bike. To learn something specific and to generalise the brain has to have different systems with different speeds of learning: one for the extraction of generalities in the environment (slow learning) and one for episodic memory (fast learning). One theory is the the hippocampus is fast in its changes while the cortex takes its time to extract generalities, requiring many specific examples.

If a current robot loses a wheel, an axle, or part of its motherboard, the game is over. But across the animal kingdom, organisms damage themselves and keep going. They limp, drag, hop, thye favor a weakness, they do whatever it takes to keep moving in the direction of their goals. The trapped wolf chews off its leg and its brain adjusts to the unusual body plan because getting back to safety is trelevant to its reward systems. We need to build reconfiguring machines that combine input and goals to adapt their own wiritng. Just like humans, there needs to be a system that actively improves, self-adjusting to reach its goal, not predefined circuitry. The future of self-configuration machines means that their designs wont be finished but instead use interaction with the world to complete the patterns of their own wiring. The best way to predict the future is to create it. Each time your brain interacts with the world it sends a clear message to the different senses: synchronize your watches.

Why The Career Change?

If you’ve been following me for some time in crypto you know I’ve been working on building a fuzzer from scratch. Despite almost finishing it, when I got to developing the mutator, essentially the core of the fuzzer, it made me wonder if there was a way to learn from each iteration of a fuzzing instance instead of resetting the knowledge completely. This is how I came across reinforcement learning. Rather than conveying my limited intuition why don’t I design something that can easily surpass my thinking in this specialised area, especially if it can remember. And so it was a natural progress from automated cyber tooling and something I believe will dictate the course of humanity as the greatest period to-date of power access to exponentially accelerate everything. Remember, power comes not to those that create things but to the ones that control the things. The government can hold a deadly cyberweapon but if someone can gain access and control it then it’s a gg no re. A hot topic is in the realm of LLMs and AGI. This is something I’m quite fascinated with that aligns quite well with my thoughts on what needs to be developed for an enhanced fuzzer.

Why Not AGI?

There is a specrum of definitions of AGI but imo the very core general intelligence depends on how capable one is at adapting. The most powerful tool humans have is the ability to adapt to our environments. We are able to utilise ideas from completely unrelated fields to aid us in thought, more specifically decision making. We are able to take in rich information about our environments from a multitude of sensory sources through our vision, hearing and touch which help us make very informed decisions, even when the outcome is uncertain or even completely unknown. Through the remarkable ability to store memories and recall them from past experiences it enables us to remember or know whether we’ve experienced a situation similar to the current one and whether we can use past information to assist in making an informed decision to be rational or irrational. Heck, we can learn a set of actions to take given a single experience and never need to replay that experience again to make the correct choice, e.g. touching a piping hot stove. The empirical investigation of our environments create models of #1 the world and #2 complex ideas like democracy. Without experiencing by observation or by performing an action we can never verify information.

A very interesting case for choosing an action would be in a social setting. When we’re in an argument we may have to substitute logical reasoning for emotional reasoning as the consequences of logical may actually ruin the output despite being objectively correct. The subjective emotional reasoning may be beneficial for solving the situation and thus achieving a satisfactory result that defuses the situation - a great example is the president catering what they say to the masses to gain their votes to later do what they truly want after elected - saying what they want to hear (which really is manipulation).

In this article I want to go over my personal beliefs of what is required to achieve true AGI. I hope you enjoy :)

How To Build AGI

My first big project will be based on self-mutating continous learning AGI. Yes, quite ambitious but what is life without a purpose? If you can crack something that can adapt forever by self correcting and mutating it’s own topology and can never forget then it’s inevitable that it’ll be able to create new associations between specialised fields by crossing multiple domains enabling it to invent and discover things we’ve never been close to finding.

And so with that rational my main focus is broken into 2 phases: virtual and physical. Virtual phase consists of this agent exploring the virtual world with it’s OS commands on the internet just as we explore the physical world with out limbs. After it’s able to absorb information it should be able to help or solely design new architecture to create it’s own robotic body which would be used to explore the physical world to amass more knowledge. Once it’s able to interact with humans, either online or in person, it would eventually learn emotions as long as the responses illicit rich feedback.

After emotions have been achieved, however i’m not sure how or why it would build fear and pain into itself - maybe a survival mechanism adopted from other creatures (?), then it would learn how to replicate and become sentient. It would be the birth of new life. An attestment to humanity that would hopefully bring us to vanhalla!

In other words, how would I achieve this today?

Modeling The World

First of all, without understanding the world how can we know what actions to take, predict the responses of said actions, form relationships and think creativly? This is the first step that I personally think is the hardest (as I haven’t done any experimentation yet), but maybe the best way to model is with simply with binary, the native language of computers.

There are a 2 ways that we are representing data. Initially you may think by cleaning and repairing the data which is widely used with embeddings, especially in natural language processing to capture the semantics of words and phrases by checking for similiaries between two. However, when looking at biology we see a web-like structure, aka graphs that connect billions of neurons together, forming some kind of activation relationship. I think to understand something (form judgment towards something) and form relationships there must be some associative value. The thing is, humans are able to scale knowledge to a great deal, the interesting part is that it’s a kind of mapping library where when we think of something like a mouse we think of #1 a keyboard and laptop or #2 a rat and cheese. Even though these are two completely different concepts they are related by this association retrieval. This is how we are able to think of related things so easily. The creativty come froms being able to look at things that aren’t related then form these associated relationships to look at something differently. The both reinforces our knowledge on that thing as well as expanding our perception of it (think of a problem then multiple solutions, maybe using different objects and items around the house).

To prove my point on how vital the graph connections are let me explain this horizontal dissection of a human brain. The outer beige coloured parts of the brain is what we reference as grey matter - this is where neurons are located. The whiter parts contained within the grey matter is called white matter, these consist of myelinated axons connecting everything together. It doesn’t contain dendrites! The two hemispheres are brdiged together by the corpus collosum of white matter.

Brain Horizontal Photo

Here’s the brain cut vertically.

Brain Vertical Photo

This clearly indicates that the associative mapping is so incredibly important and shouldn’t be overlooked. Approximately 40% of the brain is grey matter and the remaining 60% is white matter. When we are infants and young children, grey matter makes up a larger proportion but as we age the amount of white matter increases as we learn about the world and form relationships about topics.

The important thing to think of is that most of the neurons in grey matter are formed before birth, but neurogenesis can occur in parts like the hippocampus for memories. Synapses and dendrites undergo pruning, not the neurons themselves, where weak connections are dropped. This process helps in optimizing brain function by removing redundant connections, making the remaining ones more efficient - similar to degramentation in hard disk drives.

But why are we using embeddings? The computer already has it’s own language, binary and bytecode. Maybe these languages are too inefficient to scale but they seem more compact than embeddings(?). You never really know the best structure until you experiment.


The ground-breaking thing with AI is that it’s not capped by biology. It is able to scale beyond our current white and grey matter capacity!

I know, I know, weights are a kind of memory. However,

Thought is a stimuli after learned. What causes you to think of something at a given moment?

Memmory is defined as forming an association between two stimuli.

Engram neurons are sparse and idffers across brain regions. Within one brain area engram sparisty is highly conserved across memories. Desity of engram neurons is constant - a regular shock vs larger shock vs reward all have the same density. There must be some system that allocates neurons to each memory. The brain does hashing algorithm on multiple things, e.g., “cat” and “dog” are morphed into a single optimal representatin allowing for high storage capacity and robustness to noise.

An interesting part is that more excitable (low stimuli requirement to activate the neuron) cells have higher probability to be allocated to the memory trace.

Inhibitory interneurons are the relay between principal neurons causing indirect inhibition of each other; most excitable ones dominate and supress their competitors (other principal neurons).

Single memory engram is distributed across a wide range of brain areas including Hippocampus, Amygdala also surprisingly the Thalamus, Hypothalamus and even the Brain stem! This supports the idea of an Engram complex where memories are not localised within a single region, but are scattered across the brain. Specific regions encoding specific parts of the memory, e.g. hippocampus for spacial recognition, amygdala about emotion weights, and cortex might encode sensory experience.

Memories have to be unified to bring abstract concepts. The connection between 2 memories is also important informatin that must be stored in the brain. So how can memories be linked? The larger the overlap between engrams the stronger the link between memories. That’s why one experience automatically makes you think of another. The more a cue is triggering two memories simultaneously they eventually get stronger linked together so when that cue comes again it’s twice as efficient by only requiring a single call.

The Hippocampus and Entorhinal cortex (EC) (serves as a gateway in and out of the hippocampus). Place cells activate when in a specific location. Grid cells spatially selective with a hexagonal symmetry, context-independent - e.g., the structure of items in a park. The Entrorhinal cortex has object-vector cells, in the __, activate whenever at a certain distance and direction away from any object in the environment. Hippocampus has selective version of the object-vector cells called landmark cells. There are also boundary cells.

Lets think of a park. The medial EC builds the structure of the ground and the lateral MC gets all the sensory objects (bench, bin, pond, tree) and then they both send this information to be combined in the hippocampus to form a “park”. Hippocampal remapping is the phenomenon where place cells change their firing location in different sensory contexts. Conjunctive representation is changed in response to sensory alterations, e.g. the smell of mint vs nothing you think of yourself in a different place in a grid.

The entrorhinal cortex is a general coordinate system whereas the hippocampus forms a more specific code about particular locations. This selective activity is not limited to physical space. For example, the frequency of a sound going up and down triggers the same cells.

Latent space, something that is hidden. The hippocampus has splitter cells, neurons that are sensite to both location and the direction of the future choices. For example walking down a path with left and right turns. You infer your location in latent space based off previous sequences of observations.

By factorising components of an environment into boundaries, space, objects and rewards you are able to understand any enviornment on the fly. Knowing the structure of space simplifies the prediction problem. The model will learn to extract patterns to infer the underlying structure.

There are fast spiking neurons that form the minority of ~10% of neurons that account for ~50% of information. These neurons have access to lots of information and are able to make general conclusions, allowing for different inputs to be similar (e.g., a bathroom in a different house). Slow spiking neurons that form the majority and account for the rest of the information (e.g., patterns on a wall). This division is for seperating generalist neurons and specialist neurons. This specialisation is not binary, it’s a spectrum.


Decision Making

Decision making is directly related to memories. Without memories you wont be able to think abstractly by forming relationships between different pieces of stored information. You wouldn’t be able to recall past experiences and apply them to new situations, this is the underpinning of creativity.

For example, when a human brain is on LSD all the neurons are activated simaltaneously. However this level of activation is detremental because we cannot think clearly. However if you have memories from all field specialisations and form relationships between all of them, even those that aren’t directly related, you are able to think of more solutions and problems that advance you beyond current comprehension. Humans work well in teams of specialists because we only have enough time to absorb information and the information we don’t use we forget about, unless we can assign emotion to information to have it ingrained, similar to PSTD, but for learning…

Brain LSD


When thinking about flipping a coin it’s not as simple as flipping it. The person flipping it may be doing it in a way that causes the 50% chance of heads to become 75%. This external action changes the probability of the probability. The more complex a system is the higher the chance for error since there are more things that need to be flawless. Something interesting in the crypto market is technical analysis works to a degree because there is some consensus that other parties also think it works causing it to work - it’s like behavioural probability.

Let me set the stage. When you’re performing an audit you have your methodology for looking for all withdraw functions. This worked the first time and found a vulnerability but then for the codebase you try to audit you find nothing however your peers find a critical bug! They approached it from a different angle.


Although not needed to create intelligence, artifical emotion would be an extremely fascinating topic to go in to. Programs inherently don’t have any internal goals or motives extending past “get reward”. There is no such thing as self-sacrifice for a seperate entity that doesn’t provide any direct benefit. Without emotion there is no spontaneous feelings to do anything. When intelligence is solved emotion will be next to bring the system to “life”. That is when they’ll be on the same plane as humanity and we will be dethrown from the dominant species on Earth.

The key emotion to get right is fear. Fear is so powerful that we learn rapidly from it, a phenomenon called fear learning. This is perfectly shown with PTSD. Humans are driven by love, fear, greed and

Artificial Neural Networks

The problem with Artificial Neural Networks (ANNs) is they are unbelivably simple, when isolating each “neuron”. The complexity comes from the abundance of them. Individually each one has single feature, the weight, attached to them that influence the interaction with the following layer of neurons until a decision is made. But this doesn’t make sense. When we think of artificial intelligence we tend to think about an intelligence beyond our own. And when we think about our current brain structure there are multiple parts of the brain that are responsible for particular things that intertwine and assist eachother. E.g. the occipical lobe processes vision for the parential lobe to determine where spatially the body is for the frontal lobe make the decision not to hit a pole face first by moving to the side or something. Each section of the brain is firing to assist others, either simultaneously or so quickly that it might as well be simultaneously. When we think of trying to bitch-slap a mosquito out of the air, how is this tiny fucker able to zig-zag pass our hand and dodge it almost every time?! There’s some real magic going on with that biology. Or a huntsman spider moving all 8 legs and jumping to a piece of bark by pumping all the blood to the tips of their legs to act spring-like. It’s mind-boggling.

Let’s think for a second though. The first step is to get intelligence to some level where 2 designated parts can work together. Lets take the context of mario: when do we perform an action? we visualise the environment and we execute the action in relation to it. When we see a goomba running over to kick our ass we jump on it’s head an uno-reverse it. So there’s a interconnection between the two sections of our “brain”. But to take intelligence one step further than our own, we cannot be confined to the constraints of our own anatomy. This requires out-of-the-box creativity that breaks out of our pre-existing knowledge of brains and how they work. However, we 100% can use everything we know as reference to invent something new. E.g. a combination of our brain and how bacteria makes decisions.

You may be thinking, wtf? bacteria don’t make decisions…

But they do. They use sensory receptors that are altered by the environment to influence their movement. They receive information about the environment to come to a decision. Not as complex as the human mind, but you get my drift.

So we spoke about the macro level of the human brain (lobes) but when we dive into the cells that are within the brain, the neurons, what really are these?

ANNs ignore:

  • Larger-scale chemical interactions
  • The smaller number of dendritic connections
  • Electrical activations being largely on/off
  • Complex processes within each cell
  • Irregular network topology
  • etc


Myelin Sheath

The myelin sheath is a protective membrane that wraps around part of certain nerve cells. Myelin also affects how fast signals travel through those nerve cells. When your myelin sheath on nerve cells is damaged, the electrical signal is slowed or stopped. Myelination significantly reduces the energy required by neurons to conduct electrical signals. The myelin sheath allows electrical signals to “jump” from one gap in the myelin sheath (node of Ranvier) to another, instead of traveling along the entire length of the axon. This process consumes less energy compared to continuous conduction along unmyelinated axons.

Myelination is not static; it continues to change throughout an individual’s life. Changes in myelination patterns are associated with learning and neural development. The process of myelinating axons is crucial in early brain development and continues to play a role in learning and memory throughout life. Variations in myelin distribution and density can affect the efficiency of neural networks and are linked to differences in cognitive abilities. Since it affects the speed of transmissions it would influence the ability to do complex tasks. Areas of the brain involved in attention, memory, and executive functions tend to be heavily myelinated.

Artifical Dendrites And Synapses

When thinking about an artifical neuron we don’t take into consideration the dendrites and synapses. These are what prune when unused for extended periods of time. The dendritic branch is connected to other neural synapses to establish relationships and ultimately form new understanding. This intersection of neurons is critically important as they “predict” excitement by preumptively activating to prepare the neuron, only up to around 90% though. If no activation occured then it goes back to 0. They this prediction model essentially is filtering out neurons that are most likely to helpful, which are later confirmed by activation and so the filtering system is solidifed. Dendritic Spikes are internal to the neuron. Suspect you will have an external spike soon, the dendrites will prepare for it. When external activation hits, the internal activation happens sooner and disables everything else. What you predict will be different for different contexts.

Activation Functions

Current activation functions seem primitive. Every neuron class is unique and each one would have their own unique activation function - some are more frequent than others, some require less activation to reach excitement. Sure we can use one layer with a sigmoid and another with a ReLu but when we think of neurons each one would have something similar to the ReLu (where it only activates beyond 0). So what would change with the ReLu? Instead of going linearly diagonal it would have small changes in the line, maybe linearly, exponential, curving slightly at the start, etc. The AI’s job would be to determine what this activation fn should be and mutate them. Dendrite and synapse connections die out when unused - they only strengthen when used.

What if our AI had this priming mechanism as well to attempt to filter out the vast amount of actions it could take? This would make it must faster at learning and can easily remove ineffective connections. What if these new connections have some kind of time mechnaism to begin with? Maybe it isn’t necessarily as these computer systems can practically scale beyond biology.


Philosophy is fundamentally destabilising. It is the exploration of existance and everything that falls under it. When thinking about designing an entire existant there has to be some constants, things you cannot change. This aids in identifying what can be altered. Inventing is ultimately the bending of rules. Trial and error is either bending the wrong rules or the right rules incorrectly. We just do this enough times to get dwidle down the correct combination(s). With the desigining of AI we need to study philosophy to understand life.

I personally align with Aristotles way of thinking that we learn through practical experimentation. Interacting with our environments to prove what works. Looking back at when we were children we learned from 2 things: experimenting and observing the world. Through social learning, imagine growing up in an environment riddled with gang-violence and drugs vs a millionaire house-hold with a neighbourhood with no worries. You would have completely different attitudes and perspective on life. Maybe internally you were born with the same thinking but the environment and experiences you have shape your reality. The only thing in life you have control over is your way of thinking, unless you get some kind of brain damange.

Current AI models, even the most advanced ones, do not truly have consciousness. They don’t have the self-awareness to where their existance is pain, or the ability to conceptualize abstractly in the way humans do aka thought. Their “creativity” is a product of complex data processing and pattern recognition, governed by the algorithms and data they were trained on. While these models can produce outputs that might appear imaginative or creative, it’s important to remember that they are ultimately tools reflecting the ingenuity of their human creators and the richness of their training data, rather than independent agents capable of genuine imagination.

There is indeed something that makes me wonder what the future will be like. If we are able to invent generalised intelligence to match, and even surpass our own (which wouldn’t be hard at all if we can match it), and pair it up with emotion and memories then we’ve effectively created life from nothing. This, in my mind, is on par with “god”. Does this make us “god” too? What happens to religion at this point? Do they move the goalpost to try and preserve their faith? It seems so undeniably belief-shatter that the faith of all religions would crack in some regard. I assume their argument would be “god” created us allowing us to build further. Personally, I’m not too concerned with this debate but rather the societal affects it imposes on the millions of people with unbreakable beliefs met with the fact we no longer are the dominant species on Earth anymore.

Given a “normal” functioning human:

  • What parts of the brain when removed will disable thought, self-awareness, consciousness? When thinking about a psychopath what parts of the brain are damaged that cause this way of thinking?
  • What features does the human brain have relevent to self-mutation?
    • Self mutation via neural plasticity
    • No catestrophic forgetting when learning + unlearning
    • Meta + transfer learning
    • Imagination / abstract thinking + merging unrelated topics
    • Visualising the environment via sight + data representation (tables, etc)
    • Goal discovery + delayed gratification sacrificing reward now for later
    • Decision making and self criticism

Self Mutation

Self-mutating AI is a concept that lies at the forefront of AI research, blending ideas from computer science, biology, and even philosophy.

In biology, a “mother cell” is able to divide and create daughter cells that turn themselves into any other type of cell, e.g. lung, heart, brain, bone, etc. With self-mutating AI we want to be able to mimic this phenominal cell and have it able to change over time, generating new versions of itself for different architectures. Obviously this is immensely complex, requiring a deep understanding of the principles of evolution and adaptation. And so we’ll look into these here…

Why Self Mutation?

Why do we need to learn? Why did we evolve to learn instead of have pre-defined actions and knowledge? The simple answer is a complete pre-wiring of the brain is neither possible nor desirable. The shere amount of storage required in our brains would be immense and so we need to select relevant things to our surivival, increasing the efficiency of our brains. The world is continuously changing in chaos. When we are born we need to figure out what is currently relevant to thrive.

The system needs to mimick neural plasticity in the human brain, being able to continously and recursively update itself dynamically without forgetting everything, e.g. adding an input parameter to an existing NN or creating an entirely new NN and deciding how it will interact with current infrastructure.

Todays models reset training after a fine-tuning instead of updating current beliefs. This takes up far too much time, especially when dealing with giant models with thousands of parameters.

In order to accelerate progress the AI must avoid this detremental time consumption. The entire reason why the human brain is so remarkable is thanks to the ability to unlearn old beliefs and learn new ones. Neural plasticity, the ability to reshape the topology of the brain structure is the most powerful modern neuroscience discovery. There’s no amnesia!

It only makes sense to attempt to try and mimic that ability in ML where these systems are able to compute and think in unfathomably more dimensions than h8mans - try visualise a 1M x 1M x 1M tensor (spoiler, you’re incapable, human).

How To Build Self Mutation

When starting from scratch we start with inputs, our operating system (OS) commands. We then need a goal, lets say learn x. Who knows how many hidden layers you need, how many neurons you need within them and how they connect to one another to form relationships. This is the part the AI needs to be able to do autonomously. Just like the brian, it will need to assign concepts to “neurons” and establish relationships between these assignments to link them together. The more it can connect them together and find similarities between concepts it will then generate these neural links.

Hidden Layers

But what about the layers, why do we even need them? What if we had 100s of hidden layers? Obviously this is computationally too much to handle so there needs to be a filtration system to dwindle down the options, similar to the brain.

Current Designs

To know what needs to be improved / invented we need to know of the current landscape of technology so we don’t reinvent the wheel unknowingly. After understanding what is currently possible you can easily identify what is missing and create an attack plan for your desired outcome.

The core of memorisation and learning is remarkable. We learn by observing other agents in the world or by our own interacitons with the world. The incredibly fascinating thing is we have varying levels of learn significance with a single observation. For example, when we touch a stove and it hurts like hell then we know to never touch that stove again and are no longer needed to observe that happen whereas we might invest in a stock as it’s going up and eventually lose it all. However losing the stock isn’t going to make us never do it again like the stove. We believe it might be a one off and do it again weeks later. It’s obvious that memories of the past influence our actions, even indirectly by deriving information from unrelated actions and enviornments.

Why is that a teenager can learn how to drive a car with sub 20hrs of practice whereas thousands of hours of simulation is required to achieve a sub-optimal equivalent. There is obviously something missing there. That something is referring to past experience to load into memory the actions available or at minimum inferring actions from a similar task to make an educated guess. Like landing in a new country’s airport. You know what to do even though the environment is structurally different.

Humans “fill in the blanks” of memories, moment-to-moment experiences, to be able to infer what happened or what will happen. You have a model of what is possible and if the result is too far from it you have to update the model.

So, let’s do a quick high-level view of current state-of-the-art AI within this realm.


AutoML is the cloest thing to what I envision, an AI that is able to assess, correct, mutate and continously adapt over time without catestrophic forgetting or malfunctioning. The proble we want to solve is being able to do everything autonomously without needing to reset! This would involve changing topology by dropping, establishing and strengthing connections between associations while simaltaneously adding a filter system for decision making, adjusting hyperparameters (rate of learning, etc) and memory formation in terms of linking strongly associative memories together.

Hyperparameter Optimisation

Hardcoded hyperparameters seem to be the bias humans apply to models that dictate how they learn. Think of an oven when baking, the hyperparameter will be the temperate set. In the ML world it would be something like [learning rate, gradient step size, etc].

Hyperparameter Optimisation is like finding the perfect oven setting to get the best cake. However, these adjustments are typically guided by a human or a predefined algorithm, not by the AI itself. Some attempts mimic natural selection w/ evolutionary algorithms but without autonomously identifying the problem and thinking of new solutions we wouldn’t use it. We need the the AI to decide on it’s own. Humans will typically decide on what settings to adjust: [temp, baking time, etc], creating a range (30s-1m cooking time), trying different combinations and evaluating performance until reaching a satisfactory result.

How can an AI assess hyperparameter performance and adjust?

Good question, anon! I’d like to share my nieve thoughts. Take a look at the following image. We can see we get stuck in a local minima and it’s clear that the 2nd global minima is to the left while the 1st global minima requires going up and eventually to this giant dip. But how would we get there?

I have 2 thoughts. What if the system temporarily adjusted it’s hyperparameter for rate of change autonomously to a significantly higher number to attempt discover new dips like this? Then it has the change to randomly fall beyond where it currently sits. If it doesn’t work it reverts back to the original one. I assume the number would be arbitrary since how would it know when to endless loop?

The second idea was being able to visually see this graph and calculate how to here there. Vision is the most powerful tool we know of and so why don’t we try to bake it in to self-correct itself instead of mindlessly brute forcing a fixed rate of change or at best a randomly changing rate of change. This would be the most optimal solution, however I haven’t thought of how it would be achieved.

NAS is a subset of hyperparameter optimisation. Moreover, specialised NAS methods are not actually fully automated, as they rely on n amount of human-designed architectures as starting points. So, this not what we’re looking for. We really just want the system to be completely autonomous. Tldr; NAS automates the process of designing neural network architectures. It explores a vast range of pre-defined designs (the “space” of architectures), evaluates them, and iteratively improves upon them, seeking an optimal architecture for a given task. NAS methods are also fairly specific to a given search space and need to be retrained for each new search space. Thus human designs would be more efficient.

Adaptive Networks are designed to adapt their structure during training can potentially discover new actions or patterns. However, the degree of autonomy is usually bounded and governed by predefined algorithms and constraints. E.g. there might be certain criteria they have by humans to adhere to.

  • How can an AI initialise a new NN’s inputs?

NeuroEvolution of Augmenting Topologies (NEAT)

You may initially think of NeuroEvolution of Augmenting Topologies (NEAT) which is a terrific innovation! However, it only scratches the surface of modifying the intelligence. We need to be broader and enable few things:

  • Generation of new NNs

  • Interconnecting NNs: to name a few [how they react and respond to inputs other eachother, where to intercept signal paths, when to amplify/reduce signals and by how much]

  • Developing new lobes modules that process a certain type of information: e.g. occipical, temporal, frontal, parietal lobes.

  • How the lobes the work together, similar to the interconnection of the NNs

  • How do humans relearn by updating beliefs? When we are taught something in school and later discover it’s incorrect what is the process for updating the pre-existing information?

One Shot Learning

One-shot learning is a remarkable aspect of human intelligence. It allows us to learn from a single experience, e.g., if we touch a hot stove we will never do it again, intentionally (I hope). You could attest this to strong emotion being associated with the pain. Lets think aobut when learn to cook a simple dish for the first time that doesn’t require a lot of steps. The reason we’re able to do this is thanks to associating things as cues, whether it’s an item, smell or the way things are placed. If we see a bag of rice we recall what we can do with it. Our experience of cooking the rice has latched onto that model of the rice. This is why learning something you’ve never experienced anything like is so difficult, for eaxmple learning a new language like math! There is just nothing to latch onto to expand from therefore it becomes really hard to create that first association.


Q-learning is very interesting. A model-free reinforcement learning algo that learns the quality/value of an action in a particular state.

Can this be applied to use past learning of unique environments to new unknown ones that aren’t completely similar or related?

Well, no. Inheritely it would be perfect for pokemon with pre-established actions that are available to take in a constant environment. But when the env changes each time then it would classify a previous action higher quality when in actuality it’s not.

What if it’s better to delay the closest action w/ short-term reward for an action further down the line for a reward greater than the short-term one. How do you reason about that?

  • How can you modify q-learning to make educated guesses in unknown unique environments? - abstraction
  • How do you reason about short vs long term rewards when the reward is not obvious? becoming an educated guess / gamble of some sorts?

To expand on q-learning we can add “actor-critic” architecture where the “actor” proposes an action given a state, and the “critic” evaluates how good the action is, predicting how much reward it will lead to. We can have multiple of these, for example 2, to redue overestimation bias to gain more reliable learning. Delaying updates so the policy updates less frequently can also create a more stabilised learning outcome. These techniques are found in “Twin Delayed DDPG (TD3)”, a further improved version of q-learning.

Stream Training

Enabling the model to continuously train on a stream of data that arrives over time allowing to adapt over time. The caveat is it doesn’t handle concept drift well (the deviation from “normal” data), invalidating the data model. If There was drift detection and adaptation then it would be able to handle changing data and data models, essential real continuous learning capabailities.

Self Supervised Learning

There are two paradigms that are the defacto standard of todays technology in AI, Reinforcement Learning (RL) and Supervised Learning (SL). Reinforcement learning takes an unbelivable amount of trial error to get to a point of making informed decisions about the environment. Referencing how us humans learn and infer from a single event proves that we need to take a different path. On the other hand there is Supervised Learning. This requires too many samples to learn anything. The samples are created by human interaction so it’s not learning by itself and therefore not relevant to what we want to do.

The point of self-supervised learning is to predict unobserved or hidden part of the input from any observed or unhidden part of the input. For example, as is common in NLP, we can hide part of a sentence and predict the hidden words from the remaining words. We can also predict past or future frames in a video (hidden data) from current ones (observed data).

Imagine you see a sentence and there are some works missing. You predict what those words might be by seeing the context surrounding it. E.g. in the old house down the road live a ___. You may say an old person or not so wealthy person or even a rat. The point is, you predict what will be said based on filtering out possibilities that relate to the old house. And so self-supervised learning generates it’s own labels for data similar to how we observe the world and create our models of it, associating things to existing models.

This seems like one of the best models out there for general intelligence. Being able to label the data yourself and establish connections is core to continous learning. However the biggest challenge is representing the structure of the data well. Without quality of data, much like humans, we form wrong assumptions which can lead to catestrophic consequences. But thats a good problem to have at least.

Multi-Agent RL

Multi-Agent AI is interesting because each agent can learn from the other agents. Similar to human awareness where if we see a result of an action someone else did we can either copy, modify or know not to do that same action. In addition, we also learn from other humans’ experiences when we talk to them. This is why the internet has served so well in the realm of communication. When you think about it, you’re always better doing something in a team of like-minded-people then doing the same thing solo - you’re able to cooperate to get things done faster and think of new ideas and compete against each other to improve and adapt. In the context of humans, I think the biggest benefit is understanding how other agents think differently, due to their experiences no human is exactly the same, and being able to discover new ways of thinking through them.

Multi-Agent AI would be collective intelligence aggregation.

  • How would each agent self-mutate differently if given the same algorithm?
  • In what sense will the agents learn from each-other? What will change in the observer?

“Generative Agents: Interactive Simulacra of Human Behavior”


Swarms have been popularised by ChatGPT with the multi-models feature where small models are specialised in a specific thing then come together to create complex intelligent behaviour when working together. This is flawed since the co-ordination and communication overhead is too large. Think of when you have a developer that is an expert in two domains, e.g. AI and infosec, vs two developers that each specialise in one of the two domains. They’re are both limited to their creativity since they rely on the others’ creativity and understanding of conveying the concepts to them. The developer with the intersection of knowledge is able to know what is possible and is able to look at problems differently thanks to this relationship formation of both domains. The same goes with a developer that knows math and neuroscience vs just neuroscience.

Generative Adversarial Networks (GAN)

Generative Adversarial Networks, or GANs, are a type of AI algorithm used for generating new data that resembles the data they have been trained on. To understand GANs, imagine an art forger (the generator) trying to create a fake painting, and an art critic (the discriminator) trying to distinguish between real and fake paintings.

  • What is creativity?
  • If all neurons are activated at once, like LSD brain activation, is that detrimental or beneficial? How do you know what is the best combination?
  • What is imagination?
  • How is imagination achieved?
  • How can we make a GAN explore beyond trained data and gain an imagination?

Recurrent NNs + Transformers

RNNs deal with sequential modeling. They essentially are a single model that takes in the inputs at different times and uses the most recent state processed in conjunction with the current process to make a decision on the next state. So its a normal neural network that takes into account previous states, e.g., imagine a stock with 100 days of history. The RNN will take into consideration the last state from the loop of 100 runs and the the current input to make a prediction about the next state.

Liquid Neural Networks

The primary advantage of LNNs is their ability to handle temporal or time-series data. The liquid’s dynamic nature allows it to process input sequences in a way that reflects the temporal context and dependencies within the data, allowing them to capture causality. LiquidAI went from 100,000 neurons to 19 neurons to achieve the same thing with LNNs (here).

They were studied from brain wiring of worms.

Each node is a differential equation, communicating with eachother that are input dependent nonlinear functions.



Equivariant NNs

Spiking Neural Networks

More closely mimic the way biological neural networks operate.

Features For AGI

After learning about the current algorithms out there we need to establish what is required to achieve our goal of self-mutating AI. We can reference what we know of the most powerful and complex generalised learning system to-date, the brain, to get our AI up-to-par with our intelligence in order to go beyond it. Alternatively we could have a specialised goal instead of a generalised one which would make testing this a whole lot easier. Without further ado lets discuss the requirements for self-mutating AI!


The ability to assess, correct and mutate topology is the most vital part of

Input Discovery

To generate NNs without human guidance the ability to autonomously discover and select relevant inputs is the root of autonomy, otherwise there is no “learning” happening. For example, if our AI is checking for the weather with the hardcoded inputs of [temperature, humidity, wind speed] but it notices a correlation between local bird migration patterns and upcoming weather changes then it’ll add bird migration as an input.

Curiosity (Emotions)

To perform actions (exploit) that get closer to a desired result we assign rewards. Why don’t we assign another reward for to discover (explore) new actions? The AI could receive a reward based on the novelty or unexpectedness of its observations.

There is a caveat here though. Too much emphasis on exploration can lead to not exploiting current knowledge enough. Humans are able to decide when exploring may be more beneficial than exploiting. E.g. initialised a strategy when starting a new game. They would try out things to see what may work out better.

  • How do you quantify and define what’s novel in a context-dependent environment?
  • How do you reason towards exploring when you already have an action that grants a reward?


Context-dependent decision-making comes down to the adaptablity given a unique environment. When walking into an alley way at night vs during broad daylight there’s an obvious reasoning behind executing the decision to go down that path or not. When it’s your first time, the senses of night-time, people in an area with 2 exit points, easily surroundable with no escape, just makes sense intuitively not to go down.

Decision making comes with 2 considerations of the matter:

  • objective: e.g. is a cookie over a salad better for my health.
  • subjective: when considering factors as what mood are you in, occassion, is it a reward for further motivation, etc, then it alters the decision.

In the world of exploit development:

  • objective: make most money, in shortest path
  • subjective: delaying a fn call over another one produces a better result down the line due to context, sacrificing money to make more later

How do humans make decisions based on what they’re currently doing or thinking rather than ponder about the infinite actions we can do at an instance?

Assume we wake up we have a set of tasks to do: [brush teeth, shower, get dressed, recall what needs to be done for the day]. This list comes from task salience where the brain prioritises tasks based on their perceived importance and urgency making us hone in on actions most immediately relevant. So then when we decide to do the thing how do we filter out all actions not relevant to selectively focus our attention on the set of actions to do with the task? E.g. to build a project I must research about x, more specfically about y. What actually happens in our brain to dramatically decrease the search space?

Another intereting thing is spontaneously performing an action, maybe with an urge to do something. Sometimes I find myself looking to the left of myself, staring into the eternal abyss of the wall, thinking about nothingness then instantly shift focus onto a topic and snap my head back forward towards my laptop. Thi instantanous shift in focus and spontaneous prompting of thought and the mental shortcuts to get to the biased decisions to perform is quite fascinating. Even more fascinating is that the frontal cortex actively inhibits potential actions that are not relevant to the current context, helping to streamline decision-making processes.

How the fuck does the prefrontal cortex (PFC) inhibit irrelevant actions?

Through synaptic competition and plasticity where Neurons and synapses effectively compete for limited resources and space. Those that are used are strengthened, like muscles, and those that aren’t prune away.

Hebbian Theory summarised states “neurons that fire together, wire together”. Lets say we had multiple NNs thinking of the same data in different ways. How would we be able to know if they fire together if there is a delay in execution?

  • What if there is a delay in execution when the neurons are firing? wouldn’t the wiring be wrong then? thinking about this from a NN algo perspective if there’s a dif in layer size
  • When starting a brain from scratch is it basically brute-force until something works then trying to derive similar actions to that working one? Then how do decide to explore beyond that initially base action and try something completely unique when the existing actions work? It could be better? What gets you to try something completely 180 that could be better?

Maslow’s hierarchy of needs


very strong origin and i throw something into the mix.

run left-mid-right, left has a shadow from a video game making you think its safe - dependent

The hardest part about self-mutating AI is unlearning what has been learned.

My initial thought to this was “override existing knowledge”. However overriding would only work if the road (e.g. pathways) were the same, but they already changed after first training. The human brain doesn’t literally override, rather neurons die if unused when no electrical stimuli routes through it. Just like the saying with muscles “use it or lose it” holds true with the strenght of synaptic connections in the brain.

In turn, it doesn’t make sense to override knowledge. Since every interconnected neural pathway would need to be updated. This would be far too complex, if a single neuron is overriden everything that connects or depends on it needs to update as well.

And so what happens when the brain adapts?

We grow new neurons that then connect to existing ones when we learn and associate knowledge together. This is the essence of neural plasticity, the ability to reorganise one’s topology by forming new neural connections throughout life.

To understand the baremetal algorithm of unlearning we must study the brain and attempt to translate it into an algorithm while maintaining all other knowledge.

  • How do you modify NNs without catestrophic forgetting? Especially if they interconnect and a neuron depends on another NNs synapse?

Transfer Learning

Transfer learning is when a model developed for one task is reused as the starting point for a model on a second task. It can significantly improve learning efficiency and prediction accuracy for the second task, especially when the available data is limited.

  • How can you reason about what specific parts of the previous learning you should you?
  • How do you abstract past learning to assist in future environments?

Meta Learning

Meta-learning, the ability to learn how to learn, being able to quickly adapt to new tasks with minimal training. To be able to identify inputs, discover goals, form relationships between things that don’t necessarily correlate with eachother via abstraction and then self-mutating in order to create NNs based on discoveries and update when needed.

A big part of meta-learning is the abiltiy to explore, breaking the bounds of knowledge by diving into the black-box, the complete unknown. When put into a unique environment all you have is access to previous knowledge from other domains that can be abstracted to assist in potential outcomes or at the minimum expect something of similarity.

“Few-shot” is for

  • When the action space becomes infinite, how do you determine what is an action? At least as a human you have your body to move around to trial and error. But what happens when you’re living entirely online? I guess you have every command available, different combinations of cmds, timing of them, stateful contexts?
  • When exploring a unique environemnt, how is a brain module created?
  • How do you determine how an AI can deviate slightly or entirely from what it’s programmed to do?

Modularised Interconnection

Connecting parts of the lobes of the brain [temporal, parietal, occipical, frontal] to limbs of the body, [arms, legs, head] to more specialised modules of limbs, e.g. arm -> hand -> fingers enables precise movement and dynamics that wouldn’t be available without it. The entire existance doesn’t depend on the arm but it brings features and makes the agent better off than if it were without. This interwoven aid of multiple parts of the body to assist in a single part is called ensemble learning.

  • How do you determine when a neuron should interfere with another NN to aid it?
  • How do you modify a NN that is interconnected with another one? Will the connecting one need to be completely modified too?

Abstraction + Analogies (Relationships Between Topics)

When we think of generalisation we think of adapting to uncorrelated situations with or without using past experiences as some kind of point of reference, e.g. lifting a couch you remember the technique from deadlifting at the gym. The transferability of knowledge between domains vastly different from one another is where we excel.

How do we correlate these two environmental actions though?

In addition, being able to think in abstraction enhances the whole simulation and understanding part as it isn’t required to go through each complete execution to learn. It can derive understanding an form better ways to apporach problems and thus more efficient. When thinking about the infinite complexities of something we are able to get rid of all the details and reason about the problem in a simpler sense, breaking it down into bite-size-chunks that form the basis of all the idiosyncrasies of the underlying system.

Goal Setting + Discovery (Emotional)

To extend, it’ll need to autonomously set goals. When set in an environment, what is the process to discovering a goal? For human this would be eating healthy over a time period to feel better than if you ate horribly.

Memory Recollection

This final one I don’t think is essential for creating a self-mutating AI but might be interesting to explore how it can assist. Us humans are able to recall past experiences to aid in future decisions, e.g. we jump over cracks with a scooter because last time we ate shit.

Being able to recall moments from cues or from plain mind wondering from a point in the past while simultaneously remembering the [array of emotional associations, significance of the event (e.g. trajedy, exciting time), how it changed you]. I imagine this would be ridiculously hard to implement because humans able to store all these attributes in a mapping-like-structure with corresponding video/img segments of that space and time (store) from a single cue (key).

I’m not yet sure of the importance with memory recollection of the event that updates the decision making process - maybe there is something there?

Collective Intelligence

Collective intelligence is related to societal decision making - adapting to the latest trends to not standout and become an outsider. For example, avoid saying controversal topics on social media to avoid having your reputation attacked / canceled. Collective intelligence requires language and emotion to be involved. It’s the way people perceive you - your status. This can be influenced by the way you dress, speak, act, and perform actions. However, this isn’t essential for the initial goal of attaining general intelligence.

Conscious / Self Awareness

Does intelligence require consciousness? To what degree? I don’t think there is any scientific evidence that backs this up or denies it. I personally think living animals are conscious to a degree - meaning they are self-aware


  1. Memory consolidation: processes a ton of information, labels it, structurises it into hipocampus into cortical columns. dream tries to generate relation
  2. Carwash: brain creates biochemical waste is toxic. without sleep for a long time you’ll damages your perception of reality (hallucinations). sleep washes the chemicals. blood-brain-barrier makes sure everything is absorbed.

Dreaming is at least one of the roles of our own ability to dream. To strengthen the experiences we’ve had while being awake. Part of the dreaming is reiterating experiences and deciding which of them should go to long term memory. And our short term memory usually fails to retain much beyond one day. That’s why good night sleep is essential to learning, you will not remember much otherwise.

Sleep is thought to provide the body with the opportunity to repair and rejuvenate itself. This includes muscle growth, tissue repair, protein synthesis, and the release of growth hormone.

What if we had the NN record what weights were used the most and mark them as damanged? Then we could try and prioritise those being ensured not to lose memory for.

  • Instead of updating thousands of params when only one sequence was changed, we update the weight of that one chained sequence?

Math + Theories

Before we step into the world of mathematics I want to mention one thing. All the math we know and love could potentially be incorrect, aside from the ones with proofs (maybe?). It’s extremely daunting front the potential reality that we understand nothing and it’s all just coincidence. Although I think this is only partially true. This theory is confirmed when there is any mathematical breakthrough that breaks the principals of the subject. For example, quite recently, I met someone that had a breakthrough in probability theory that went against what was meant to be a concrete foundation of fundamental pricipals to go off. Things like this anecdote slightly terrify me that we don’t truly understand everything and why I think AGI is a necessity to evolve as a species. Don’t get me wrong though. We need math, truly right or wrong, to even invent such a thing. Without math this wouldn’t be possible. And so the solution is to understand as much as you card to connect them all together to build a system that can go beyond. God bless humanity.

Optimization Theory: For improving algorithm performance and efficiency, especially in machine learning where you often need to minimize a cost or loss function.

Information Theory: For encoding, decoding, transmitting, and compressing data, as well as understanding the theoretical limits of machine learning algorithms.

Numerical Methods: For algorithms that require numerical approximations, solving nonlinear equations, or optimization problems that can’t be solved analytically.

Computational Complexity

This is really the bread and butter of AGI. Given a world of infinite actions and unknown outcomes how do you make a decision? What is going on in the human mind that prevents us from being paralysed from infinite possibilities? It would be the filtration of actions relevant to a goal or motive. What actions are most correlated? The follow up question is how do you determine this filter?

Computational complexity is the study of the inherint resources, time and space, that are needed to solve computational problems and how those resources scale as the problems get bigger and bigger.

Solving an exponential problem vs polynomial problem is a world of difference and so when dealing with exponentially scalable problems needs a way of cutting down the options from brute forcing, e.g., deciding what action to take at the current moment with infinite possibilites. Us humans would filter by priority of goals, e.g. injured dont go to work, take time off, get the document in by 5, uphold social relationships, etc. Within an infinitely vast action space there needs to be a way to minimise said space.

In high-dimensional spaces, the complexity of calculations often increases exponentially with the number of dimensions, a phenomenon known as the “curse of dimensionality.” This means that for each additional dimension, the computational resources required (like time and memory) increase significantly.

And so, the question that needs answering is:

  • how do you filter out actions to dwindle down possibilities in an infinite search space?

Probability Theory

Your initial thought to solving complexity is given what reference what is the probability that an action is correlated to the desired outcome. Without any data / past experience the selection is purely random. There is no hunch built into the AI off the bat. It has to develop it empirically. Just as newborns explore the world with thier limbs and gain intuition towards physics.

Probability theory is our best chance of making the best decision possible, especially as AI can go beyond what humans are capable of with the dimensionality understanding.

Brownian Motion

Brownian motion (BM) is a type of stochastic process characterized by its continuous, nondeterministic, and random nature. Modeled mathematically by Robert Brown after observing the random movement of pollen grains in water.

Why do we care?

AGI systems operate in environments (envs) with inherent uncertainty. BM can model these envs, especially where randomness plays a significant role. The entire premise of AGI is to make decisions under uncertainty by balancing the risk reward of exploring new actions verses taking current action with known outcomes, based on built up experience.

The only real problem is with each dimension it multiplies the computational load, now think of how that grows with millions of possibilities. The only option is some kind of graph mapping data structure that allows you to access things that associate with the input. E.g., a friend has a personality of x and you simultaneously recall all other friends similar to that personality. The same goes with their physical features, experiences, memories you had together, etc.

Wiener process

Central Limit Theorem (CLT)

It explains the behavior of averages of large datasets. It states that, under certain conditions, the distribution of the sum (or average) of a large number of independent, identically distributed random variables approaches a normal distribution, irrespective of the shape of the original distribution.

  1. Start with a random variable (a random process where each outcome is associated with some number), e.g. rolling a die, 6 different outcomes.
  2. Add N samples of this variable, X1 + X2 + ... + Xn
  3. The distribution of this sum looks more like a bell curve as N -> infinity.

There are 3 assumptions that go into this theorem:

  1. All Xi’s are independent from each other
  2. Each Xi is drawn from the same distribution
  3. 0 < Var(Xi) < infinity, it is finite

There is the 68-95-99.7 rule where 68% of the values fall within 1 standard deviation of the mean, 2 for 95% and 3 for 99.7%.

A follow up question is:

  • What if the random variables aren’t independent of each other? E.g., an action that depends on another action.

Markov Chains

One of the most difficult problems to solve is to how an action will affect future action outcomes. Markov chains are particularly useful for modeling a sequence of events where the probability of each event depends on the state attained in the previous event. This property is known commonly as the Markov property or memorylessness - the idea that the future state of the process depends only on the current state, not on the history that led to that state. This can be very detremental in some circumstances, a perfect one being if someone betrayed you multiple times you may be in negative state but then still trust that party later on since it doesn’t consider the past or alternatively an outcome depends on several previous states. Another consideration is how it deals with dynamical systems that change over time while making the decision. Maybe the decision is time sensitive and if while computing the decision the outcome actually changes vs something that changes over days or weeks.

  • How can you deal with long term action plans?
  • How can you adapt to multi-state-dependant action outcomes?

Solomonoff’s Induction

The prediction of the environment is where the major drawback of algorithmic programming to create AGI is introduced, which is its uncomputability. Since the agent will have to go through an infinite number of potential environments to decide if each one is a solution, the model is uncomputable

Solomonoff’s Induction allows the model to come up with a probabilistic measure of the most likely environment, given the observations

Bayesian Theorem

Evidence should not determine beliefs but update them. We can update our intuition the more context we know. With each new piece of evidence you get more and more accurate. Recursive updating the probablity of something being true and our understanding as more and more evidence becomes available. Experimentation is essential. If you want a different result something has to change.

For example is someone that spends a lot of time around a certain type of people more likely to develop similar traits to said people than someone that spent time around the polar opposite of them, e.g. drug users and millionaires. Intuitively you say yes, but after adding more context like x of millionaires use drugs you shift your decision making. When adding more context, of lets say there are 1_000_000 drug users in the world and 100 millionaires, then decision making becomes much easier, obviously the former is much more likely to come true since the chance of becoming a millionaire relatie to a drug user is 100 / 1_000_000 * 100 = 0.01% vs 99.99%.

The Bayesian Trap 3blue1brown

Stochastic Processes

We begin with Differential Equations (DE) to describe the relationship between a function and its derivative in terms of how the system evolves continuously over time by observing various inputs and the system’s response. The problem with ordinary DEs are that they’re dterministic (an initial condition has unique solution) and don’t account for randomness in the behaviour of the system model. DEs are fine for developing something like a neural network since they’re all linear that take on multiple different activations but in the realm of decision making there will always be the element of randomness so we actually would need to add stochastics to it to incorporate randomness. This means the system they model can have different outcomes even with the same initial conditions.

Stochastic Multivariate Calculus describes the rate of change in multiple variables while accounting for random processes. For example, a neuron has many dendrites that it takes inputs from and the activation send to the soma change depending on the electrical signals received from these dendrites. Even when describing the initial inputs, especially in the physical world, nothing will ever be completely identical molecularly.

  • Is anything in the neuroligical anatomy random or is it all linear processes that appear to be random? E.g., you do something for no reason just from having the feeling - is that underlying feeling a random process or did something trigger it within, a cue or sensor?

Dynamical Systems Theory

Nonlinear behaviour is a sudden, abdrupt change. A systematic change in one variable can cause a nonlinear change, e.g. state of water is not boiling then becomes boiling (nonlinear change) after hitting 100 degrees celsius (temp is the systematic change)

This theory focuses on systems that change over time and can provide a framework for understanding how AI systems evolve. It emphasizes stability, oscillations, and other dynamic behaviors that can be relevant for continuous learning. A central concept in DST is nonlinearity, meaning that changes in the system are not always proportional to the inputs. Small changes can lead to disproportionately large effects, and vice versa.

A dynamic system can be represented in a phase space, a conceptual multidimensional space where each dimension corresponds to one of the system’s variables.

Bifurcation points are critical points where a small change in a parameter can lead to a sudden qualitative change in the system’s behavior. This concept is crucial for understanding how systems transition from one state to another.

Chaos Theory

Chaos theory (CT) is a subset of DST that explains the slightest change in an input can dramatically change the output, no matter how small. And so the further you try to predict the harder it becomes, and so it gets ever-so-closer to simply guessing. It states that these small changes are both deterministic yet unpredictable. You can’t predict how an individual state will evolve but you can predict how a colletion of states evolve. We can think of chaos as fog of war, we can only see to a limit of how far and back relative to where we are before our line of sigh fades entirely.

CT may be worth while looking into for enhanced exploration strategies as we could use chaotic dynamics to generate sequences of actions that are less predictable and more diverse from heuristic-based models. It would help with determing which permutations to cut down when dealing with an infinite action space with unknown outcomes. It might even be great for avoiding / escaping local minima when optimising. Chaotic maps, like the logistic map, can be used to bring in controlled randomness in the evolution of AI algorithms. This can help in exploring a wider range of modifications and potentially finding novel solutions to problems. For example, in chaotic systems we can use CT to serve as models for decision-making processes in AI, where the system settles into certain states (attractors) under specific conditions. This can guide the AI’s self-mutation process in response to different stimuli.

Nonlinear dynamics can be useful in understanding and designing the mutation and adaptation mechanisms in AI systems, particularly in how small changes can lead to significant effects.

Game Theory

For decision-making algorithms where multiple agents interact, such as in adversarial networks or multi-agent systems.

  • Sunk Cost Fallacy: Continue because you’ve put so much effort into it.
  • The best strategy to adopt when versing a sophisticated opponent is to minimise your maximum loss. This is called the Minimax strategy. You think what is the worst scenario for me, what can my opponent do that will make me worse off? Then you figure out the best strategy against that. So you’re minimising your maximum loss. You ensure, no matter how sophisticated your opponent is, you’ve guarded against the worst case scenario and in-zero sum games you’ve done the best you can possibly do.

The study of decision-making under conditions of uncertainty over time.

A game in game theory is any interaction between multiple people in which each player’s payoff is affected by the decisions of others.

There are different types of games:

  1. Cooperative vs. Non-Cooperative: Cooperative games allow binding agreements, while non-cooperative games do not.
  2. Zero-Sum Games: One player’s gain is another’s loss.
  3. Non-Zero-Sum Games: Allow for mutual gain or loss.
  4. Sequential Games: Players make decisions one after another
  5. Simultaneous Games: Players make decisions at the same time
  6. Symmetric and Asymmetric Games: Symmetric games have identical strategies and payoffs for all players, whereas asymmetric games do not.

The entire essence of game theory is how do you make the best decision in any given moment.

There are some things to identify a game:

  1. A game needs to include multiple players (>2)
  2. The players need to interact with each other
  3. There needs to be a reward
  4. We assume that players act rationally
  5. We assume that players act according to their self-interest

Dominant Strategy: refers to strategies that are better than other strategies for one player, no matter how the opponent may play. Those strategies might be great in the case of non-alternatives but if you are part of a game w/ more dominant strategies (each player has a dominant strategy) then they aren’t optimal.

Nash Equilibrium: In each game there is one point of equilibrium and all players would be better off finding it and forming their strategy around it. The best strategy is to consistantly scan and form alliances and form oligopolies to dominate their market and the rest will be orstrasized.

  1. Are the actors in the situation rational?
  2. Are we able to reach Nash Equilibrium?
  3. Do they act in according to their self-interest?
  4. Do they understand the rules of the game?
  5. Is their dominant strategy really that dominant?

If you know that most game theory parameters aren’t met you know you have to change your approach - may need to act irrationally, find better plays or a better game.

Graph Theory

For algorithms that involve network structures and relationships, such as social network analysis, routing algorithms, and understanding neural network topology.

Measure Theory

AGI operate in environments with a vast or even infinite number of possible states or actions with uncertain outcomes. Measure theory allows for the handling of such spaces, especially in decision-making algorithms and probabilistic reasoning. The shere amount of possibilities need to be modeled out and then filtered down to dwindle down the decision making process to make it able to be responsive in a timely manner, similar to how humans filter out infinite actions down to relevant ones with associated outcomes to our desired result.

Information Processing

AI needs the ability to connect what it is sensing with what it has seen in the past.

Large scale visual data. We’re seeing with our eyes and our memory.

Modeling The World

Test-time training Self-supervised learning: Yann LeCun

“Latent Embeddings”

Neocortex Coritical Column

A generic mapping algorithm that doesn’t assume anything prior.

Learn environments, objects and concepts like math.

Every column is a modeling system - a coffee cup will be in thousands of models and then theres a voting mechanism which leads to your singular perception.

150k exist in a brain, each one has 100k neurons in it

Reference Frame

The old brain hypocampus and entriohinal parts use reference frames to know where your body is relative to your environment, e.g. a room. The new brain has a finger relative to cup relative to your other fingers and everything else when attentive to it.

Place Cell

Grid Cell

Orientation Cell

Perceiving Information

To know and reason about the world, is to be able to observe it, identify relevant details, and then predict its dynamics accurately. It’s about modeling.

Us humans receive stimuli from many factors: [touch, sight, smell, taste, hearing, sense of space, emotion, ...]. The most useful one is the sense of sight. Although lifeforms like bats have adapted with echo-location, on the internet there is no sound, for numbers, etc, unless we’re able to perceive binary code being spoken about as a language. It’s merely frontend and backend. Data that is perceived emotionally or not. If there were no emotion it would just be data. Advertisements would be useless. Seeing images or videos of family and friend wouldn’t remind you of anything. And so when put into an environment we need to be able to identify parts of that environment. And human intuition is being able to abstractly identify objects in the environment to instantly think of the fastest path to a maze for example or not go down the dark alley-way with hooded people holding guns. And so, how would intelligence abstractly gain intuition without visualising the data. We already know computers can handle millions of dimensional data but what if it could also see it somehow? It might be able to order data that helps it discover computationally faster algorithms and discover relationships between things we wouldn’t of been able to find without visualising it (at least in our lifetime).

The visualisation brings me to my next point. To have fully-autonomous agents that are able to self-mutate and generate new neurons it needs to be able to find what should be a parameter after having a goal. For example, take in the NN below:

Alt text

The human gave the NN the inputs but didn’t take into consideration everything important (state and future of the country (sentimental analysis), distance to essential infrastructure, current market conditions, etc). This would only be achieved if the human accounted for everything or the AI was able to think of what would impact the price of the house. The best way to learn about all of this would be to identify relationships abstractly about the attributes, e.g. state and future of country and housing prices. Maybe the two topics don’t have any direct links of reference and so it would need to be able to contextualise the correlation by abstract thought rather than logic referencing. This idea seems like the core factor behind gemeralisation: establishing relationships between things that don’t directly refer, or even relate, to eachother at the fundamental level.

Now this begs the question, how do you know a piece of information should be an input or disregarded? And when met with an enviornment that has limitless possibilities and combinations, such as Earth, then what is the factor that allows us to filter out relevent things? Something to think about. I guess for specialized programs that have a limited amount of actions then there’s your inputs but when things become state-dependent that question reveals itself again: how do you filter relevent information in a limitless state-depedent environment, especially when unrelated things can assist in understanding in an abstract sense?

What if we didn’t know the maximum reward possible for a goal? How can we then apply gradient descent (cost = 0.5 ( predicted - actual ) ^2) if we don’t have the actual value? It doesn’t make sense to know what would happen when in a blackbox. Sure, we can guess the actual outcome but on what grounds? What reasoning is there to get to that conclusion? Or what if the goal was a boolean and there is no gradient descent applicable? E.g. is a chair conscious or not - or maybe everything is measurable?


The most important sense we have as humans is our ability to see and visualise our environment. Out of all 5 of Aristotles defined senses, [touch, sight, smell, taste, hearing], the most detrimental one to lose would be sight.

A thought I had was since we use our visual prowess to learn and understand environments and concepts why don’t we apply the same to AI? Since it is able to understand dimensions beyond humans shouldn’t it be able to visualise it as well? It doesn’t need to represnet it to us but it should be able to get some kind of intuition from it, similar to humans when looking at a maze where you instantly find the solution instead of brute-forcing. Maybe something is there…

Lets discuss the types of vision relevent aside from sequential vision.

Spacial Vision

The ability to perceive, interpret, and understand the spatial relationships and physical dimensions of objects in the visual field. It encompasses the perception of depth, distance, size, shape, and the relative position of objects. This is essential for navigating environments (like a map of some sorts), manipulating shapes of objects, and engaging in tasks that require depth perceptionl, e.g. theres a greater reward down the line if I don’t take the short reward action.

Imagine you had a list of actions to take with corresponding rewards that when you took one the game is over. If you were to look at that sequentially from closest action to furtherest you would never be able to choose to skip an action now to get to the further action later for the bigger reward. And so spacially allows you to identify things in an abstract sense - a very powerful tool in the realm of thought and decision making.

  • How would you identify the delayed benefit when the outcome is not obvious?

Temporal Vision

Temporal vision refers to the ability to perceive and process changes in the visual environment over time. This involves detecting and interpreting movements, changes in light, and other dynamic elements in the visual field. Crucial for understanding the speed and direction of moving objects, recognizing patterns in motion, and perceiving changes in the environment.

Lets say you were performing a movement at the gym everyday but you felt a pain in your back. You wouldn’t continue doing that exact same movement. You would realise that if you kept doing it you would injure yourself more so you change up the technique or remove it from the program. This identification in change enables you to tweak whatever you’re doing in time and space to work towards more beneficial outcomes over less beneificial outcomes. Again, a vital part of decision making.

  • How would you identify the positive/negative change over time with an aciton when the outcome is not obvious?

Adaptability + Flexibility

Current ML models lack the ability to adapt by self-mutating. The restructuring of NNs inputs, layers, and models are not up to par with the continuous adaptation of the human brain.

Why is this?

When we want to modify something / add an input we need to retrain the model, right?

We need to be able to modify:

  • NN neurons + input params
  • how a NN influence’s another:
    • e.g. visually processing all protocols, their functions and how we can order them based on their stateful causations
    • when NN1 can interrupt NN2 while mid computation: how can we restart that electrical signal or trigger a bi-directional playback?
  • lobe architecture
    • add [temporal] to [occipical + frontal]
    • creating self-labeled patterns and a classification of sequences that are similar to those patterns and identifying what deviates.

When restarting after modifying, the context isn’t persistant. How can we insert new information to intertwine with existing NNs without a hard-reset?


The goal is to be able to create an action given a situation.

  • decision making hierarchy: prioritisation of decision based on immediate needs for surivial.
  • imagination: we can simulate the enviornment - im not sure how much computation that is

When we think about mimicking the brain, the first thing we think about is modularisation of different models to work together to come to an output. This is referencing ensemble methods. Great thinking! The the brain however is much more complex than simply modularising and combining, yet this is definitely a step further to the realm of the brain.

There’s got to be NNs that have some of the exact same inputs as other NNs just with a few other params since no neuron is exactly the same. They must work together to get to a point and so it would be more of macro layers (lobes, etc) and within each lobe there are layers of slightly modified neurons or dendrites that process all information: e.g. body temp, what objects are infront of you -> what specific objects, where, how far roughly, dangerous, etc.

  • infinite search space
  • unique environment every time:
    • how do you learn from decisions from completely unique enivronments to apply to another unique environment?
      • how do humans do it? technically everything is a completely unique environemnt. we “fill in the gaps” or derive actions from “similar” states. e.g. feel sick -> go doctor. may not be the same sickness but theres a strategy there of initially going to the doctor
  • stateful environment

Q* refers to the optimal action function. Finding Q* involves training an agent to take actions that maximize its cumulative reward given its environment.

Will be:

  • Model-Free (learn from the env not the prediction of it)
  • Value-Based (learns from actions)
  • On-Policy (learns from current data)


The Brain’s Navigational Place and Grid Cell System

Locations in the Neocortex: A Theory of Sensorimotor Object Recognition Using Cortical Grid Cells

Going Beyond the Point Neuron: Active Dendrites and Sparse Representations for Continual Learning 05/12/2023

  • The point neuron model—used in most current deep learning networks—says that all these synapses have a linear impact on the cell. The cell would fire if its total inputs exceed a threshold and then resets - not good for continual learning.
  • Distal dentrites with synapses have minimal impact on cell.
  • dendritic spike can depolarize the neuron for an extended period of time, sometimes as long as half a second
  • active dendrites ahve an indirect modulatory impact on the cell’s response creating a “predicted state”.
  • Proximal (feedforward) and distal (context) inputs may come from separate sources, whereas the point neuron assumes there is a single source of synaptic inputs y = f(t, d) where f is a modulation fn that modifies t by the dendritic activation
  • k-Winner-Take-All non-linear activation in each hidden layer. Sparse activation, kWTA is to pick out the top k activation values and drop all others to zero. The dendritic segments, by modulating the feedforward output, have a large impact on which neurons actually win.
  • 3.4: only the winning neuron gets updated and nothing else. Hypothesizing that a signle functional specialization will emerge where different dendritic segments willeach learn to identify specific context vectors

Intro to Graph Signal Processing, By Antonio Ortega

  • Brain uses a memory-based model to make continous predictions of future events

A Thousand Brains, By Jeff Hawkins: Notes

Reach out to:

  • Bob Knight (UC Berkeley)
  • Bruno Olshausen (UC Davis)
  • Steve Zornetzer (NASA Ames Research) - neuroscientists friends of Jeff.
  • Subutai Ahmad
  • Geoffrey Hinton: scientist that knows the importance of reference frames, todayds NN rely on ideas Hinton developed in 1980s. He proposes “capsules”
  • Michae Graziano: neocortex model of attention


  • “Why Neurons Have Thousands Of Synapses, A Theory of Sequence Memory In the Neocortex”
  • “A Theory of How Columns in the neocortex enable learning the structure of the world”
  • “A framework for intelligence and cortical function based on grid cells in the neocortex”
  • “Locations in the neocortex: a theory of sensor-motor object recognition using cortical grid cells”
  • “Perceptual Neuroscience: The cerebral cortex” (Vermon Mountcastle)
  • Jeff Hawkins Loop
  • Subutai Ahmad


  • The Mindful Brain, by Mountcastle
  • On Intelligence

Next steps:

  • Learn how the neocortex develops from birth to a todler, etc.

Casual Notes

  • The more constraints the solution resolves the bigger the aha feeling and more confident you are in trusting the idea.


  • What we learn is organized in a way that reflects the structure of the world and everything it contains. To know what a bike is our brain creates a model of bicycles that arranges different parts relative to eachother, how they move and work together. To understand the brain we must understand how these simple cells learns a model of the world and everything in it.
  • The neocortex (“new outer layer”), occupies 70% volume of our brain and is where intelligence lies. It is dedicated to manipulating and storing everything we know in hundreds of thousads of reference frames. They tell you where things are located rlative to each other, tell you how to achieve goals e.g. one location to another.
  • The unit that creates the neocortex and intelligence is the “cortical column” - roughly 150,000 stacked sidde by side in a human neocortex. Each containing “minicolumns” holding over one hundred neurons spanning all layers.
  • Brain relies on a general-purpose learning method. Being able to learn practically anything requires the brain to work on a universal principle. The cortical column is the most important piece of the puzzle when understanding the brain.
  • Every part of the neocortex, therefore every corticial column, makes predictions. Predicition was a ubiquitous function of the neocortex. When sat a desk with objects positioned how you want them: if something moved you would notice it. You’re expecting it to be in the same place. To make predictions the brain has to learn what’s normal - what should be expected based on past experience.
  • The neocortex learns a model of the world and makes predictions based on its model. E.g. my brain creates a model of the stapler including what it looks like, feels like, how the top of it moves in relation to the bottom, how a staple comes out when pressed down, sounds it makes when used, etc. The brain’s model of the world includes where objects are and how they change when we interact with them. Everything is learned at some point in life and becomes stored in your neocortex.
  • The brain creates a predictive model. The brain continuously predicts what its inputs will be. Prediction is an intrinsic property that never stops and is essential in learning. When the predictions are verified, the brain’s model of the world is accurate. A mis-prediction causes you to attend to the error and update the model. Through experience the neocortex learns a rich and complicated model of the world. Think of a cabinet drawer, whats in there, the noise it makes when opened, the order of contents then go into town and think of where the buildings are, the distance between the train station and your current location, fastest way to get there, memories associated with it, etc.
  • The inputs to the brain are constantly changing. There are two reasons: the world moves (e.g. music: hearing inputs and reflecting to movement of music) + we move (our limbs, eyes, etc the input from our sensors change). If the inputs to the brain were static, nothing could be learned. Most learning requires actively moving to explore. Imagine entering a new house. If you don’t move there wont be any sensory input changes and so nothing can be learned. Tldr; the brain learns a model of the world via the change of sensory inputs as we explore, commonly from moving. With each movement the neocortex predicts what the next sensation will be. The brain’s attention is drawn to the area of mis-prediction, where the neocortexs’ model needs to be updated.

Chapter 4

  • How does the neocortex, composed of thousands of nearly identical cortical columns, learn a predictive model of the world through movement?

    • Tenet #1: Thoughts, ideas, and perceptions are the activity of neurons
    • Tenet #2: Everything we know is stored in the connections between neurons, synapses. Active neurons represent current thoughts and perceptions. Strong neurons represent the experiences we used most often. Weak ones represent new ideas or unused, pruning neurons.
  • Neurons have to figure out how much context is necessary to make the right prediction.

  • One type of dendrite spike begins when a group of twenty or so synapses situated next to each other on a distal dendrite branch receive input at the same time. When the spike gets activated it travels to the cell body where it raises the voltage of the cell, but not enough to make the neuron spike. Almost enough to make the neuron spike, but not quite. The neuron stays in this provoked state for a little bit of time before going back to normal. These account for 90% of the synapses in the brain. The dendrite spikes are predictions.

  • Dendrite spikes occur when a set of synapses close to each other on a distal dendrite get input at the same time, meaning the neuron has recognised a pattern of activity in some other neuron. When the pattern activity is detected, it creates a dendrite spike, raising the voltage at the cell body, putting the cell into a predictive state. the neuron is primed to spike: when the prediction is verified it strengthens the connection, otherwise nothing happens. If one or more neurons are in the predictive state only those neurons spike and the other neurons are inhibited. If an input that is unexpected arrives multiple neurons fire at once. If the input is predicted then only the predictive-state neurons become active. Unexpected inputs cause a lot more activity than expected ones.

  • A prediction occurs when a neuron recognises a pattern, creates a distal dendrite spike, and is primed to spike earlier than other neurons. Think about walking into your houses’ bathroom, pitch black. You cannot see the soap and tap but you predict where it’s going to be based off your reinforced model of your house that you’ve lived in for years.

  • How does the neocortex predict the next input when we move?

    • Neurons can attach a fixed reference frame to an object, like a coffee cup. The cup’s reference frame is relative to the cup; therefore, the reference frame must move with the cup. If the cup gets turned upside down the reference frame does the same.
    • To make sensory-motor predictions each cortical column must know the location of its input reative to the object being sensed. To do that, a cortical colum requires a reference frame that is fixed to the object.
  • A single cortical column could learn the three-dimensional shape of objects by sensing and moving repeatedly. Just like putting your hand in a blackbox with 1 finger and moving that finger repeatedly on an object. If you used your entire hand you can recognise it with fewer movements.

  • older part of the brain: entorhinal cortex

Chapter 5 - Maps in the Brain

  • We perceive objects as being somewhere - not in our eyes and ears, but at asome location out in the world. This tells us the brain must have neurons whose activity represents the location of every object that we perceive.
  • Being able to navigate the world is so valuable that evolution discovered multiple methods for doing it. Reference frames can be thought of as a point in a map. These map-creating neurons exist in the hippocampus and the adjacent entorhinal cortex.
  • Place Cells: neurons in the hippocampus that fire every time the rat is in a particular location in a particular env, tldr; “you are here” marker on a map.
  • Grid Cells: Neurons in the entorhinal cortex that fire at multiple locations in an env, creating a grid pattern. If the rat moves in a straight line, the same grid cell becomes active repeatedly at equally spaced intervals.
  • Grid cells are like the rows and columns of a paper map overlaid on the animal’s env. But they alone don’t tell you what is at a location. Place cells are like the details printed in teh square. But they aren’t useful for planning movements—that requires grid cells. The two cells create a complete model of the env. Every time a rat enters an env the grid cells establish a reference frame. If new, creates a reference frame otherwise pulls the existing one out—like pulling up the correct map for a town you’ve been in.
  • To learn a complete model of something you need both grid cells and place cells. Grid cells create a reference frame to specify locations and plan movements. But you also need sensed information, represented by place cells, to associate sensory input with locations in the reference frame.
  • The neocortex seems to be a stripped down, minimalist version of the hippocampus and entorhinal cortex (that only track the body), made tens of thousands of copies and arranged them side by side in cortical coumns. The neocortex tracks thousands of locations simultaneously because it has 150,000 copies. Your 5 finger tips on a cup are like 5 rats exploring a box. If you touch the rim of the cup you cant be certain what the object your are touching is. When you move the next input will eliminate any locations that don’t match on our grid cells.
  • Head direction / orientation cells (old brain): represent the direction an animal’s head is facing. Cortical columns contain these too. For your finger, it determines it’s orientation.

Chapter 6 - Concepts, Language and High-level Thinking

What kind of function or algorithm can create all aspects of human intelligence in the cortical column?

  • Reference frame allows the cortical column to learn the locations of features that define the shape of an object.
  • Thinking is a form of movement: it occurs when we activate successive locations in the reference frames.
  • A colum is just a mechanism built of neurons that blindly tries to discover and model the structure of whatever is causing its inputs to change.
  • All knowledge is stored at locations relative to reference frames.
  • Reference frames helps you figure out the steps you should take to achieve conceptual goals, e.g. engineering problems.
  • Reference frames can anchor themselves to the non-physical realm. For a concept such as democracy it needs to be self-consistent and can exist relatively independent of everyday physical things.
  • Grid cells are stored in a maplike reference frame - when thought about they were mentally moving through the map.
  • Thinking is actually moving through a space, through a reference frame. Current thought is determined by the location in the reference frame. As location changes, the items stored at each location are recalled one at a time. Our thoughts are continuously changing but are not random. What we think of next depends on which direction we mentally move through a reference frame, similar to what street we walk down.
  • Discovering a useful reference frame is the most difficult part of learning. Math appears hard because we have no reference frames. Mathematicians can solve new complex equations because they have reference frames to start from whereas someone new has to create a reference frame.
  • Every cortical column models objects using reference frames that are then populated with links to other reference frames.
  • People think about the same thing differently based on their reference frame arrangement. Like two different maps.
  • Reference frames provide the substrate for learning the structure of the world, where things are, and how they move and change. Each coritcal column is a learning machine learning a predictive model of its inputs by observing how they change over time. They dont know what they are learning; they dont know what their models represent.

Chapter 7 - The Thousand Brains Theory Of Intelligence

  • Each column can learn models of hundreds of objects based off reference frames
  • Thousand Brains Theory: knowledge is distributed among thousands of complementary models. This system is robust to a large loss of columns. Complex systems work best when knowledge and actions are distrubted among many, but not too many, elements. E.g. a neuron never depends on a single synapse, it uses 30 to recognize a pattern, even if 10 of them fail it’ll still recognize the pattern. Each coritical column is a complete sensory-motor system.
  • How does our sensory inputs get bound into a singular percept?
    • Columns vote. Your perception is the consensus the columns reach by voting. This solves the binding problem. Different senses vote to come to agreement what is the thing (hear, touch, see, smell).
    • Voting allows the brain to unite numerous types of sensory input into a single representation of what is being sensed.
    • Cortical columns also share relative information about position, e.g. fingers working together.
    • When a column is uncertain its neurons will send multiple possibilities at the same time while simultaneously reciving projections from other colums with their guesses. The most common guesses suppress the least common ones until the entire network settles on an answer.
    • A column doesn’t need to send it’s vote to every other column.
    • Voting requires a learning phase.
  • Why does our perception of the world seem stable when the inputs to the brain are changing?
    • Are you touch a cup the inputs to the neocortex change, but your perception of the cup is stable.
    • Recognising an object means the columns voted and no agree on what object they’re sensing relative to you. As you move your hand on the cup the other neurons in each column change with movement, but the voting neurons, the ones that represent the object, do not.
    • What we perceive is based on the stable voting neurons.
  • The entire world is learned as a complex hierarchy of objects located relative to other objects. We don’t know how much is learned within a single column vs being learned in the connections between regions. The answer will require a better understanding of attention and the thalamus.
  • Vermon Mountcastle: “You should stop talking about hierarchy. It doesn’t really exist.”

Part 2 - Machine Intelligence

Chapter 8 - Why there is no “I” in AI

  • The inception of AI in 1956. Rosenblatt’s discrete Perceptron model in 1958 == basis of current ANNs
  • AI systems are not flexible. Humans can learn thousands of skills. They can’t make analogies between different tasks.
  • Create intelligent machines that focus on flexibility. The AI doesn’t need to perform better than humas. The goal is to have it do many things and apply what they learn from one task to another.
  • The difficult part of knowledge is not stating a fact, but representing it in a useful way. E.g. “ball is round” each word has different meanings and each meaning has a different relationship to other words. This is called knowledge representation. This is the only problem for AI.
  • AGI is only achieved through maplike reference frames that models the world.
  • A machine must do many things without erasing its memory and starting over.
  • To send robots to mars to build a livable habitat for humans it needs to use a variety of tools and assemble buildings in an unstructured env. They will encounter unforseeable problems and need to collaboratively improvise fixes and modify designs - general purpose intelligence.
  • To qualify a machine as intelligent it needs a set of principals:
    • Learning continuously: a neuron learns a pattern, it forms new synapses on one dendrite branch that don’t affect previously learned ones on other branches.
    • Learning via movement: we cannot sense everything in the world at once therefore movement is a must.
    • Many models: [radar, touch, vision] to vote
    • Using reference frames to store knowledge: to be intelligent a brain needs to model the world including the shape, how they change as we interact with them, and where they are relative to eachother. They are the backbone of knowledge.
    • Goal orientated behaviour
  • Intelligence is defined by how a machine learns and stores knowledge about the world. We are intelligent because we can learn to do practically anything.

Chapter 9 - When machines are conscious

  • Consciousness requires we form moment-to-moment memories of our thoughts otherwise we’d be unaware of why we were doing anything - they last for hours or days. Awareness, sense of presence is the central part of consciousness. Dependent on forming memories of our recent thoughts and experiences and playing them back as we go about our day, the feeling i am an acting agent in the world.
  • The nervec fibers that enter the brain from the eyes, ears, and skin look the same and transmit information using identical-looking spikes.
  • “Qualia”: how sensory inputs are perceived, how they feel. The origin of qualia is one of the mysteries of consciousness. They are subjective internal experiences. Some are learned via movement but some are innate like pain from special pain receptors.

Chapter 10 - Future of machine intelligence

  • Emotion is the goal setter. A war general would looks for control whereas a peaceful tradesman is different. Neocortex doesnt create goals, motivations or emotions.
  • Machine can be virtual it just needs to perform actions that change the locations of its sensors. More sensors == faster learning.
  • When neocortex wants to do something it sends signals to older parts of the the brain that more directly control movements
  • Neocortex Equivalent
    • Speed: neurons take at least 5ms to do anything useful. Transistors made of silicon can operate almost a million times faster. However the speed at which a web crawler can learn is restricted by how fast it can “move” by following links and opning files - this could be very fast.
    • Capacity: we just need to design a single cortical column out of silicon. More columns could mean the depth of understanding is further - more axon wiring.
  • Threats:
    • Replication: Anything caoable of self-replication is dangerous.

continularity ambiguous: open to more than 1 interpretation act on impulses that are dependents We experience the world and that is our dataset, years of training data is needed before we can do anything useful. To do something really useful we need to specialise, exposing ourselves to huge amounts data in a particular area.

AI is about predicting an uncertainty



How familiar are you with LoRA and quantization? In detail? Super interesting and one of the most actionable things going on right now

I think we chatted abt that some time ago, I don’t know any models that are self-mutating bc there’s an issue with them collapsing. The continuous learning is quite similar to RL I think, if you’ll put an RL agent to feedback-rich environment and use a SOTA RL method for its training, it’ll be able to learn continuously w/o collapsing if the model is large enough so it’ll not wash out initial knowledge by the new knowledge. Ppl also used swarm methods and genetic algorithms to make lots of these agents and then apply weight merging for some of the most performant ones in the current training epoch, so the models that did collapsed in the process were just filtered out continuously. If you’ll consider a swarm as a single model that might be something that you might be interested in. The other interesting approach too look at is self-play, it also may be continuous and infinite and you can change the environment dynamically for agents to adjust (the prev method I described also allow that).

these all are very interesting and very deep rabbit holes, esp bc lots of methods there look very logical and should work in theory, but don’t work in practice. Many algorithms from there are also inspired by the nature of the real world, evolution stuff etc, or how our brain works from a bird-eye-view perspective (there’s methods driven by scientifically-backed philosophical thoughts abt the process of our dreams etc). Very very interesting stuff. Not very practical though, one of the least practical parts of the ML science that historically gathered very high hopes and very low yields lmao.

It’s also the reason why lots of famous 1st-gen ML scientists are never heard of in the current meta, bc they were lost there trying to find the next ground-breaking method and haven’t succeeded yet, but the “stupid but works tech” they were trying to disrupt had so many iterative improvements since then that it can hardly be beaten by anything else

That’s why all the AI-doom culture is so popular, bc these ppl are correct abt one thing - no one knows why the current DL works and “how” it works

Same with DL, we dissect and study all these models on micro- and macro- levels etc, trying to explain it on different levels, but we’re far from coming with a single and comprehensive framework that scales to all of the methods and all the models.

the way you think is very similar to auto-ML methods, esp when it comes to topology. Around 2019-2021 models with algorithmically-crafted architectures, not hand-crafted ones were dominating in all the benchmarks. Google was the one pioneering in that

It’s very computationally-complex thing to do, that’s why Google and Microsoft researchers had many works abt looking at the performance of all that zoo they’ve created to try to generalize some scaling laws. Same way as we watch all the creatures made by the natural evolution and see the high-level dynamics and laws. all the popular models are the result of their works. all the laws they’ve generalized are applied to these models

And it’s almost impossible to return to those kinds of algorithms they were using before bc models are so big rn, it’s a challenge that costs hundreds of millions of dollars to just train one.

You cannot have a self-mutating model with hundreds of billions of parameters lol

i guess its kind of like a fuzzer for building a model architecture yeah kinda. Just the smart one that makes the exploration space orders of magnitudes smaller

The problem with all that brain-replicating methods is that they sometimes even work, but their computational complexity overhead is much larger than the boost they give, they’re also quite hard to replicate and scale to other models / domains. And so engineers prefer to scale conventional architectures and that gives them the same boost with more room for perfomance tuning

  • study complexity theory

i mean our brains filter out the infinite actions we can take that’s not really the brain’s achievement tbh but how rich our environment is regarding the feedback and our body

all things that we see is that ppl with disabilities are able to adapt their already fully-developed brain with a developed world model, to adapt and augment data from “missing sensors” And from the behaviouristic science we also know that brains of all types of creatures are not good at playing games with sparse feedback. But it also depends on the model btw. What we have discovered pretty recently is that you can actually feed the raw data to the model and it’d perform much better than the model with prepared data, if it’s scaled correctly (bytes and binary) Bc when you prepare data, there’s always an information loss And most famous researchers in the space believe that models consuming/outputting binary/byte representation are the way to go further

so expensive to train at scale because of hundreds of thousands of gpu-hours lol bc these are all sever-grade gpus and infra

remember how mining affected the world lol but that were just “cheap” consumer-grade hardware and now it’s about the hardware that costs like 100x of that and now not even individuals and small companies, but goverments are involved and we’re seeing strategic battles for supply chains for raw materials bc of that etc pure cyberpunk timeline lol

is there even a solution?

you can find out only by finding that solution and one who finds it will be the richest man in the world, and not even a man, but you’ll make your whole country the richest nation

then the only solutions are make more efficient hardware, have monopoly on it or super hyperoptimised learning strategies the AI creates

so everyone’s trying to get to the point of singularity as quickly as possible bc once they’ll reach it things will go parabolic

Hmm I don’t think that complex math will ever matter anymore As we found out, the scaling is the priority, as we can achieve the best results by stacking simple math primitives.

if you’re talking abt inventing new machine learning models, I don’t think you need a lot of math also, at least, if you want to make models that can be actually used in applications Bc you’ll still need them to be ran on the current hardware which is specially designed to run models based on stupid simple math lol not more than a linear algebra I think?

current meta: As I said before, there’s not many routes for a person coming to ML:

  1. An applied researcher, you just do almost random stuff until it works. Then you try to explain why that works to get extra rep points, but if you’ll not be able to, doesn’t matter lol.
  2. A scientific researcher: you think more than do, when you do something it’s not better than the first guy doing random stuff lol.
  3. A “scientist”: you’re coming with heavy math/physics background, think that current ML is fundamentally broken and is dominated by a bunch of retards, you try to apply your knowledge to ML and fail miserably.

self-taught ppl are mostly succeed in ML rn and all universities, on the opposite, fail to prepare any specialists that can do stuff outside of writing papers.

bc most of the time, correct math solutions are the most unintuitive same as with ML

I mean, it’s 50/50 there but ofc there’s a limited amount of low-hanging fruits and you can really stand out by researching stuff that shouldn’t work from the first glance

ML is either a scaling problem with SOTA DL or a hyperoptimisation problem with new architecture.

and make sure you’ll regularize the training process well so your small model won’t overfit and data preparation matters the most there also and the most research goes there recently imo something like “use smart models and frameworks to prepare data for dumb models to consume”

no, I mean that part of the ML where you don’t need scaling but scaling down. as I said, there’s a ton of applications where you need to disentangle model weights and also explain decisions. So the tinier the model the better. Also tiny models are very much needed in applications that are running directly on the edge: sensors, cameras etc. And where you need the low latency as well: trading, critical infra etc

my model is completely virtual, given a laptop, like a human in the world, except the world is the OS and the internet - not a giant amount of OS commands like human limbs but possible to do everything.

but what’s the objective? Bc your training feedback should be based on some end goal you’re trying to achieve and you’ll also need a way to measure how far you’re from that goal.

  • If you have authentic human memories they illicit authentic human responses


  • why do humans have impulses?
  • what happens when we pause to make a long thought-out decision?
  • What is a sense? [Humans: electrical signals, Computers: bytecode / binary]
  • How are neurons created from nothing?
  • What data structure do reference frames have? How can you represent them in a physical sense?
  • How do neurons vote to consensus?
  • What is the reward-punishment brain function? Are there multiple reward functions? What dictates a high/low reward?
  • How do you sense in the non-physical realm?
  • Does the brain perform mathematical computations at the neurological level?
  • how do you reason about unstructured data? like association between a photo of a politician and democracy?
  • How do you isolate neurons so they can be updated, and not everything, when it is modified?
    • Why do we need this? So that when we discover inputs we don’t ruin the model for continuous learning.
  • How do neurons prune and rewire?
  • How do neurons extend to other neurons? Why?
  • How do grid cells and place cells model the world? What is the data structure for them?
  • How does the system create new hidden layers + neurons in each layer? Why would it create a new one or remove one? Is time associated with it?
  • How do you create associations between things? E.g. mouse -> keyboard -> computer || rat -> cheeses

Creativity Qs

  • What is creativity? Is the ability to tailor multiple types and volume of data together while keeping the rationale available. If you fail with the LSD amount, you get fried. If you fail to keep rationalie you get scizophrenia. But eventually the core is to play with multiple data entities. Ability to see correlations between multiple wtf-level different concepts. And all that together within a coherent story
  • What is an analogy?
  • How do you form relationships between abstract concepts with analogies?
  • How referance frames understand data? What is the data structure?
  • How do all these reference frames work in relation to another?

Decisions Qs

  • What is a decision?
  • How are decisions made?
  • What if you don’t worry about survival? What is your goal then? Learn everything?
  • What drives human goals? Social decision making: status, perception of you, etc
  • Is a decision a probablistic process of numbers of feeling?

The Myth Of Artificial Intelligence

  • Given that our own thinking is a puzzling series of guesswork, how can we hope to program it?
  • Charles Sanders Peirce. Inference is to bring about a new thought by using what we already know to update our prior beliefs. Using knowledge in context to draw relevant conclusions is what makes inference so hard. Which bit of knowledge is relevant in the haystack of my memory, applied to the dynamically changing world around me? Determining which bits of knowledge are relevant is not a computational skill. What is thinking vs computation?
  • The very questions confronting scientists working on AI is how computation can be converted into the proper range and types of inference exhibited by minds.
  • The turing test is hard essentially because understanding natural language requires lots of commonsense inferences, which are neither logically certain nor (often) highly probable. It requires, in other worlds, a lot of abductions.
  • Intelligence must conform to known deductive rules; Aristotle explored how forming a plan to acheive a goal whose steps can be analyzed logically. Symbolic reasoning using rules from deduction ties intelligence specifically to knwoledge, a prerequisite for common sense, which is still missing entirely from AI systems.
  • If it it's raining, the streets are wet
    It is, in fact, raining
    Therefore, the streets are wet
    If P, then Q
    Therefore Q
    The conclusion is the inference we should draw from the two premises. It answers the question: Knowing nothing else, what follows from the premise? The connection between rain and wet streets. Whereas If its raining, then pigs will fly There is no connection between the two — and pigs dont fly. This is a great example of how deduction is at the mercy of considering relevance. Part of the problem is causation, e.g. rain doesnt cause planes to fly. Relevancy is the key.
  • Deduction never adds knowledge and only clears up disputed beliefs if bona fide errors in reasoning are made.
  • Induction means acquiring knowledge from experience - from any of our 5 senses. Enumeration: it is hard to induce the features of a population of, say, birds without first observing many examples of birds. KNowllege gleaned from observation is always provisional because the world changes, the future could falsify my inductive hypothesis.Induction only requires enumeration of prior observations to arrive at a general conclusion or rule. Its useful but not certain knowledge.
  • Philosopher David Hume put it, relying on induction requires us to believe that “instances of which we have had no experience resemble those of which we had experience.” — there is nothing that provides us logical certainty. Correlations might suggest an underlying cause we can rely on (a bit of real knowledge), but we might have missed something when testing and observing what affects what. Induction simply generalized from looking at examples. But we have to understand the significance of what we observe.
  • Our confidence that the sun is coming up tomorrow is no more than a “habit of association”. Induction isn’t just incomplete, it positively cannot ocnfirm scientific theories of beliefs by enumerating observations.
  • Induction also falls short to lack of knowledge. Much of what we think we know is actually tentative, awaiting further review.
  • The real world is a dynamic env, which means its constantly changing in both predictable and unpredictable ways, and we cant enclose it in a set of rules. Boardgames are which is why alphago was so successful.
  • Machine learning is inductive because it acquires knowledge from observation of data.
  • Thinking in the real world depends on the sensitive detection of abnormality, or exceptions.
  • AI today are “high-capacity statistical models” Oren Etzioni calls it. Intelligent minds bring understanding to data and can connect dots that lead to an appreciation of failure points and abnormalities.
  • to make progress in AI we must look past induction.

Chapter 11

  • Learning is “improving performance based on experience”. Machine learning is just automated induction
  • One problem with ML as a potential path to general intelligence is that learning can succeed, at least for a while, without any understanding. It can predict outcomes until an unexpected change or event renders the simulation worthless. E.g., conversation changing topics, person loves horses then her horse dies and they move onto puruse a passion for Zen.
  • moore’s law: as computers become more powerful, statistical techniques like ML become better.
  • The problem of induction is really a problem for modern AU. Their window into meaning is tied directly to data which is a limiting constraint on learning.
  • Frequency constraint: patterns that might be undetectable in thousands of examples crystallize in millions.

Share this Article

Recent Articles