AI and large language models

When discussing AI, the term often refers to large language models built on machine learning, which predict the next part of text based on probabilities. This short text, designed to be read in just a few minutes, is intended for anyone interested in the topic. It also includes a practical code example at the end to delight software developers who happen to find their way here.

🚀 Astonishing performance

Considering how primitive the underlying technology of modern artificial intelligence is, it is still astonishing how well large language models perform in practice on many tasks that require multi-step reasoning from humans. Current technology enables the automation of tasks in ways that would previously have been impossible or extremely expensive.

How large language models work

All text used by the model is encoded as numbers. During training, meaning fitting to source material, the model’s parameters are adjusted. These are weights inside the model that define a probability distribution. When generating text, the model calculates probabilities for possible next parts of the text, selects one at random, and repeats the process.

💡 Large language models do not reason

Despite their name, artificial neural networks are not intended to model how the brain works. They are mathematical function approximators that fit statistical patterns in source material. What appears as reasoning is simply only the result of the model reproducing reasoning-like structures found in its source material rather than following logical rules.

Example of reasoning-like structures:

Question: If the light is off, is the room dark?
Fact: The light is off.
Reasoning: Given that the light is off, the room is dark.
Answer: The room is dark.

Question: If the device is unplugged, is it powered?
Fact: The device is unplugged.
Reasoning: Given that the device is unplugged, it is not powered.
Answer: It is not powered.

Question: If the door is locked, can it be opened?
Fact: The door is locked.
Reasoning: Given that the door is locked, it cannot be opened.
Answer: It cannot be opened.

Large language models are not black boxes

They may be difficult to interpret, but their internal workings are not unknown. Their behavior is driven by statistical patterns rather than hidden logic, so they can be understood at a statistical level.

📉 Model collapse

Current models are fundamentally limited by source material rather than compute. Their performance depends on high-quality, diverse, real-world source material, which is both finite and increasingly contaminated. Model collapse is already observable, reflecting a degradation in the underlying probability distribution.

How a function approximator works in practice

Example source material:

// The key represents the x-coordinate, and the value represents the y-coordinate.

$source_material = [ 

    1 => 3,
    2 => 5.1,
    3 => 6.9,
    4 => 9.2,
    5 => 10.8
];

Example of how to fit the model to source material:

/*

This example uses linear regression because it is one of the simplest machine
learning models that fits inputs and outputs rather than applying fixed rules.

*/

function fitter(array $source_material): array {

    $x = array_keys($source_material);
    $y = array_values($source_material);

    $n = count($x);
    $average_x = array_sum($x) / $n;
    $average_y = array_sum($y) / $n;

    $numerator = 0;
    $denominator = 0;

    for($i = 0; $i < $n; $i++){

        // Add how changes in x relate to changes in y (covariance)
        $numerator += ($x[$i] - $average_x) * ($y[$i] - $average_y);

        // Measure how much x varies from its average (variance)
        $denominator += ($x[$i] - $average_x) ** 2;
    }

    $slope = $numerator / $denominator;
    $offset = $average_y - $slope * $average_x;

    return [$slope, $offset];
}

Example of how to run the model:

/*

This binds model parameters to a model, creating a function that
behaves like hard-coded logic but is derived from source material.

*/

function function_approximator(callable $model, array $parameters): callable {

    return function(float $input) use ($model, $parameters){

        [$slope, $offset] = $parameters;
    
        return $model($input, $slope, $offset);
    };
}

function model(float $input, float $slope, float $offset): float {

    return $slope * $input + $offset;
}

$parameters = fitter($source_material);
$fitted_function = function_approximator('model', $parameters);

// Make a prediction

$input = 4;
$output = $fitted_function($input);

/*

Because the source material does not follow an exact rule, the model finds the best-fitting
approximation, so the result may differ slightly from what an exact rule would produce.

*/

echo $output; // 8.97

Paste the link to this page into your AI service and ask how the code works, large language models are very good at it.