Open In Colab

LLaMEA AutoML example

This notebook shows a simple usage of LLaMEA to automatically generate and refine a Python-based machine learning pipelines for a given task and dataset.

[ ]:
#this dependency is sometimes missed by poetry
!pip install swig
!pip install llamea==1.0.5

[33]:
#cleaing up
!rm -R exp-*
[26]:
# Cell 1: Imports
import os
import numpy as np
from llamea import LLaMEA, Gemini_LLM
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import math
import random
import sklearn

Cell 1: Set up the LLM

If you haven’t already, set your OpenAI or other API key in your environment variables, e.g., export OPENAI_API_KEY="..." or export GEMINI_API_KEY="...."

You can also use Gemini in most countries for free.

[28]:
from google.colab import userdata
import os

# Set environment variables
api_key = userdata.get('GOOGLE_API_KEY_1')
os.environ['GOOGLE_API_KEY'] = api_key

#api_key = os.getenv("GEMINI_API_KEY")
llm = Gemini_LLM(api_key, "gemini-2.0-flash")

Cell 2: Define an evaluation function for LLaMEA

  • The function must accept a “solution” argument, which contains code, a name, etc.

  • You parse solution.code (the raw code), dynamically load it, and run it on your problem(s).

  • You then set_scores() to record how well it did.

We’ll define a simple example on the breast cancer dataset. We’ll ask the solution code to build a machine learning model that can predict the test set. We’ll then return a score based on the accuracy of the predictions.

[29]:

# Load the data set X, y = load_breast_cancer(return_X_y=True) ( X_train, X_test, y_train, y_test, ) = train_test_split(X, y, random_state=1) def evaluate(solution, explogger=None): """ Evaluates a solution on the breast cancer dataset. """ code = solution.code algorithm_name = solution.name exec(code, globals()) algorithm = None # Final validation algorithm = globals()[algorithm_name](X_train, y_train) y_pred = algorithm(X_test) score = accuracy_score(y_test, y_pred) solution.set_scores( score, f"The algorithm {algorithm_name} scored {score:.3f} on accuracy (higher is better, 1.0 is the best).", ) return solution

Cell 3 - define the instructions

Now we define the instructions that LLamEA will provide to the LLM. The instructions are split into the following parts:

  • task_prompt: the main task description with a general overview of the task.

  • example_prompt: one or more code examples to guide the search in the beginning.

  • output_format_prompt: how the LLM should generate the output.

[30]:
task_prompt = f"""
You are a highly skilled computer scientist in the field machine learning. Your task is to design novel machine learning pipelines for a given dataset and task.
The pipeline in this case should handle a breast cancer classification task. Your task is to write the Python code. The code should contain an `__init__(self, X, y)` function that trains a machine learning model and the function `def __call__(self, X)`, which should predict the samples in X and return the predictions.
The training data X has shape {X_train.shape} and y has shape {y_train.shape}.
"""

example_prompt = """
An example code structure is as follows:
```python
import numpy as np
import sklearn

class AlgorithmName:
    "Template for a ML pipeline"

    def __init__(self, X, y):
        self.train(X, y)

    def train(self, X, y):
        # Standardize the feature data
        scaler = sklearn.preprocessing.StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)

        # Let's create and train a logistic regression model
        lr_model = sklearn.linear_model.LogisticRegression()
        lr_model.fit(X_train, y_train)
        self.model = lr_model

    def __call__(self, X):
        # predict using the trained model
        return self.model.predict(X)
```
"""

output_format_prompt = """
Give an excellent and novel ML pipeline to solve this task (within a 120 second time-limit) and also give it a one-line description, describing the main idea. Give the response in the format:
# Description: <short-description>
# Code:
```python
<code>
```
"""