**Introduction**

Large Language Models (LLMs) are versatile generative models suited for a wide array of tasks. They can produce consistent, repeatable outputs or generate creative content by placing unlikely words together. The “temperature” setting allows users to fine-tune the model’s output, controlling the degree of predictability.

Let’s take a hypothetical example to understand the impact of temperature on the next token prediction.

We asked an LLM to complete the sentence, **“This is a wonderful _____.”** Let’s assume the potential candidate tokens are:

`| token | logit |`

|------------|-------|

| day | 40 |

| space | 4 |

| furniture | 2 |

| experience | 35 |

| problem | 25 |

| challenge | 15 |

The logits are passed through a softmax function so that the sum of the values is equal to one. Essentially, the softmax function generates probability estimates for each token.

Let’s calculate the probability estimates in Python.

`import numpy as np`

import seaborn as sns

import pandas as pd

import matplotlib.pyplot as plt

from ipywidgets import interactive, FloatSliderdef softmax(logits):

exps = np.exp(logits)

return exps / np.sum(exps)

data = {

"tokens": ["day", "space", "furniture", "experience", "problem", "challenge"],

"logits": [5, 2.2, 2.0, 4.5, 3.0, 2.7]

}

df = pd.DataFrame(data)

df['probabilities'] = softmax(df['logits'].values)

df

`| No. | tokens | logits | probabilities |`

|-----|------------|--------|---------------|

| 0 | day | 5.0 | 0.512106 |

| 1 | space | 2.2 | 0.031141 |

| 2 | furniture | 2.0 | 0.025496 |

| 3 | experience | 4.5 | 0.310608 |

| 4 | problem | 3.0 | 0.069306 |

| 5 | challenge | 2.7 | 0.051343 |

`ax = sns.barplot(x="tokens", y="probabilities", data=df)`

ax.set_title('Softmax Probability Estimates')

ax.set_ylabel('Probability')

ax.set_xlabel('Tokens')

plt.xticks(rotation=45)

for bar in ax.patches:

ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height(), f'{bar.get_height():.2f}',

ha='center', va='bottom', fontsize=10, rotation=0)

plt.show()

The **softmax function with temperature** is defined as follows:

where (T) is the temperature, (x_i) is the (i)-th component of the input vector (logits), and (n) is the number of components in the vector.

`def softmax_with_temperature(logits, temperature):`

if temperature <= 0:

temperature = 1e-10 # Prevent division by zero or negative temperatures

scaled_logits = logits / temperature

exps = np.exp(scaled_logits - np.max(scaled_logits)) # Numerical stability improvement

return exps / np.sum(exps)def plot_interactive_softmax(temperature):

probabilities = softmax_with_temperature(df['logits'], temperature)

plt.figure(figsize=(10, 5))

bars = plt.bar(df['tokens'], probabilities, color='blue')

plt.ylim(0, 1)

plt.title(f'Softmax Probabilities at Temperature = {temperature:.2f}')

plt.ylabel('Probability')

plt.xlabel('Tokens')

# Add text annotations

for bar, probability in zip(bars, probabilities):

yval = bar.get_height()

plt.text(bar.get_x() + bar.get_width()/2, yval, f"{probability:.2f}", ha='center', va='bottom', fontsize=10)

plt.show()

interactive_plot = interactive(plot_interactive_softmax, temperature=FloatSlider(value=1, min=0, max=2, step=0.01, description='Temperature'))

interactive_plot

At T = 1,

At a temperature of 1, the probability values are the same as those derived from the standard softmax function.

At T > 1,

Raising the temperature inflates the probabilities of the less likely tokens, thereby broadening the range of potential candidates (or diversity) for the model’s next token prediction.

At T < 1,

Lowering the temperature, on the other hand, makes the probability of the most likely token approach 1.0, boosting the model’s confidence. Decreasing the temperature effectively eliminates the uncertainty within the model.

**Conclusion**

LLMs leverage the temperature parameter to offer flexibility in their predictions. The model behaves predictably at a temperature of 1, closely following the original softmax distribution. Increasing the temperature introduces greater diversity, amplifying less likely tokens. Conversely, decreasing the temperature makes the predictions more focused, increasing the model’s confidence in the most probable token by reducing uncertainty. This adaptability allows users to tailor LLM outputs to a wide array of tasks, striking a balance between creative exploration and deterministic output.