AI and large language models (LLMs)

When talking about AI, people often mean large language models based on machine learning, which is also the focus of this page. This very concise text is intended for anyone interested in the topic, and it also includes a practical code example to entertain myself and other ordinary software developers who happen to find their way here.

How LLMs Work

During training, the model adjusts parameters, which are numerical weights inside the model, to identify statistical patterns in data, forming a probability distribution over how "tokens" tend to appear and relate to each other. A token is a small piece of text such as a word, part of a word, or punctuation. When generating text, the model processes the current text and computes probabilities for possible next tokens, selects one, then repeats the process.

LLMs Do Not Reason

Despite their name, artificial neural networks are not intended to model the brain. They are mathematical function approximators that fit functions to data, rather than following explicit logical rules.

How a function approximator works in practice

Training data:

$training_data = [

    1 => 3,
    2 => 5,
    3 => 7,
    4 => 9,
    5 => 11
];

Example of how to train the model:

function trainer(array $training_data): array {

    $x = array_keys($training_data);
    $y = array_values($training_data);

    $n = count($x);
    $average_x = array_sum($x) / $n;
    $average_y = array_sum($y) / $n;

    $numerator = 0;
    $denominator = 0;

    for($i = 0; $i < $n; $i++){

        // Add how changes in x relate to changes in y (covariance)
        $numerator += ($x[$i] - $average_x) * ($y[$i] - $average_y);

        // Measure how much x varies from its average (variance)
        $denominator += ($x[$i] - $average_x) ** 2;
    }

    $slope = $numerator / $denominator;
    $offset = $average_y - $slope * $average_x;

    return [$slope, $offset];
}

Example of how to run the model:

/*

This binds model parameters to a model, creating a function
that behaves like hard-coded logic but is derived from data.

*/

function function_approximator(callable $model, array $parameters): callable {

    return function(float $input) use ($model, $parameters){

        [$slope, $offset] = $parameters;
    
        return $model($input, $slope, $offset);
    };
}

function model(float $input, float $slope, float $offset): float {

    return $slope * $input + $offset;
}

$parameters = trainer($training_data);
$fitted_function = function_approximator('model', $parameters);

// Make a prediction
$input = 6;
$output = $fitted_function($input);

echo $output; // 13

LLMs Are Not Black Boxes

They may be difficult to interpret, but their internal workings are not unknown. Their behavior is driven by statistical features rather than hidden logic, so they can be understood at a statistical level.

Model Collapse

Current models are fundamentally limited by data rather than compute. Their performance depends on high-quality, diverse, real-world data, which is both finite and increasingly contaminated. Early-stage model collapse is already observable and reflects a degradation in the underlying data distribution.