LLaMEA AutoML example¶

This notebook shows a simple usage of LLaMEA to automatically generate and refine a Python-based machine learning pipelines for a given task and dataset.

[ ]:

#this dependency is sometimes missed by poetry
!pip install swig
!pip install llamea==1.0.5

[33]:

#cleaing up
!rm -R exp-*

[26]:

# Cell 1: Imports
import os
import numpy as np
from llamea import LLaMEA, Gemini_LLM
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import math
import random
import sklearn

Cell 1: Set up the LLM¶

If you haven’t already, set your OpenAI or other API key in your environment variables, e.g., export OPENAI_API_KEY="..." or export GEMINI_API_KEY="...."

You can also use Gemini in most countries for free.

[28]:

from google.colab import userdata
import os

# Set environment variables
api_key = userdata.get('GOOGLE_API_KEY_1')
os.environ['GOOGLE_API_KEY'] = api_key

#api_key = os.getenv("GEMINI_API_KEY")
llm = Gemini_LLM(api_key, "gemini-2.0-flash")

Cell 2: Define an evaluation function for LLaMEA¶

The function must accept a “solution” argument, which contains code, a name, etc.
You parse solution.code (the raw code), dynamically load it, and run it on your problem(s).
You then set_scores() to record how well it did.

We’ll define a simple example on the breast cancer dataset. We’ll ask the solution code to build a machine learning model that can predict the test set. We’ll then return a score based on the accuracy of the predictions.

[29]:

# Load the data set
X, y = load_breast_cancer(return_X_y=True)
(
    X_train,
    X_test,
    y_train,
    y_test,
) = train_test_split(X, y, random_state=1)

def evaluate(solution, explogger=None):
        """
        Evaluates a solution on the breast cancer dataset.
        """
        code = solution.code
        algorithm_name = solution.name

        exec(code, globals())

        algorithm = None

        # Final validation
        algorithm = globals()[algorithm_name](X_train, y_train)
        y_pred = algorithm(X_test)
        score = accuracy_score(y_test, y_pred)

        solution.set_scores(
            score,
            f"The algorithm {algorithm_name} scored {score:.3f} on accuracy (higher is better, 1.0 is the best).",
        )

        return solution

Cell 3 - define the instructions¶

Now we define the instructions that LLamEA will provide to the LLM. The instructions are split into the following parts:

task_prompt: the main task description with a general overview of the task.
example_prompt: one or more code examples to guide the search in the beginning.
output_format_prompt: how the LLM should generate the output.

[30]:

task_prompt = f"""
You are a highly skilled computer scientist in the field machine learning. Your task is to design novel machine learning pipelines for a given dataset and task.
The pipeline in this case should handle a breast cancer classification task. Your task is to write the Python code. The code should contain an `__init__(self, X, y)` function that trains a machine learning model and the function `def __call__(self, X)`, which should predict the samples in X and return the predictions.
The training data X has shape {X_train.shape} and y has shape {y_train.shape}.
"""

example_prompt = """
An example code structure is as follows:
```python
import numpy as np
import sklearn

class AlgorithmName:
    "Template for a ML pipeline"

    def __init__(self, X, y):
        self.train(X, y)

    def train(self, X, y):
        # Standardize the feature data
        scaler = sklearn.preprocessing.StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)

        # Let's create and train a logistic regression model
        lr_model = sklearn.linear_model.LogisticRegression()
        lr_model.fit(X_train, y_train)
        self.model = lr_model

    def __call__(self, X):
        # predict using the trained model
        return self.model.predict(X)
```
"""

output_format_prompt = """
Give an excellent and novel ML pipeline to solve this task (within a 120 second time-limit) and also give it a one-line description, describing the main idea. Give the response in the format:
# Description: <short-description>
# Code:
```python
<code>
```
"""

Cell 4: Create and run the LLaMEA search¶

Now just simply run LLaMEA and see the results.

[34]:

# We'll use a small number of iterations for demonstration
es = LLaMEA(
    evaluate,
    n_parents=1,
    n_offspring=1,
    llm=llm,
    eval_timeout=120,
    task_prompt=task_prompt,
    example_prompt=example_prompt,
    output_format_prompt=output_format_prompt,
    experiment_name="AutoML-example",
    elitism=True,
    HPO=False,
    budget=3,
)

best_solution = es.run()
print(f"Best found solution: {best_solution.name}, Score={best_solution.fitness:.4f}")
print(f"Generated code:\n{best_solution.code}")
print(f"Additional feedback: {best_solution.feedback}")

/usr/local/lib/python3.11/dist-packages/joblib/parallel.py:1383: UserWarning: The backend class 'SequentialBackend' does not support timeout. You have set 'timeout=135' in Parallel but the 'timeout' parameter will not be used.
  warnings.warn(
/usr/local/lib/python3.11/dist-packages/joblib/parallel.py:1383: UserWarning: The backend class 'SequentialBackend' does not support timeout. You have set 'timeout=135' in Parallel but the 'timeout' parameter will not be used.
  warnings.warn(
/usr/local/lib/python3.11/dist-packages/joblib/parallel.py:1383: UserWarning: The backend class 'SequentialBackend' does not support timeout. You have set 'timeout=135' in Parallel but the 'timeout' parameter will not be used.
  warnings.warn(

Best found solution: StackingEnsemble, Score=0.9790
Generated code:
import numpy as np
import sklearn
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.pipeline import Pipeline

class StackingEnsemble:
    """
    A stacking ensemble of Logistic Regression, SVM, and Decision Tree classifiers,
    using cross-validation for robust performance.
    """

    def __init__(self, X, y):
        self.train(X, y)

    def train(self, X, y):
        # Define the base models
        lr = LogisticRegression(solver='liblinear', random_state=42, C=0.1)
        svm = SVC(probability=True, random_state=42, kernel='rbf', C=1)  # Enable probability estimates
        dt = DecisionTreeClassifier(random_state=42, max_depth=5)

        # Create a voting classifier (soft voting)
        estimators = [('lr', lr), ('svm', svm), ('dt', dt)]
        self.ensemble = VotingClassifier(estimators=estimators, voting='soft')

        # Create a pipeline with scaling and the ensemble
        self.pipeline = Pipeline([
            ('scaler', StandardScaler()),
            ('ensemble', self.ensemble)
        ])

        # Train the ensemble using cross-validation
        self.pipeline.fit(X, y)

    def __call__(self, X):
        # Predict using the trained pipeline
        return self.pipeline.predict(X)
Additional feedback: The algorithm StackingEnsemble scored 0.979 on accuracy (higher is better, 1.0 is the best).