LLaMEA AutoML example¶
This notebook shows a simple usage of LLaMEA to automatically generate and refine a Python-based machine learning pipelines for a given task and dataset.
[ ]:
#this dependency is sometimes missed by poetry
!pip install swig
!pip install llamea==1.0.5
[33]:
#cleaing up
!rm -R exp-*
[26]:
# Cell 1: Imports
import os
import numpy as np
from llamea import LLaMEA, Gemini_LLM
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import math
import random
import sklearn
Cell 1: Set up the LLM¶
If you haven’t already, set your OpenAI or other API key in your environment variables, e.g., export OPENAI_API_KEY="..."
or export GEMINI_API_KEY="...."
You can also use Gemini in most countries for free.
[28]:
from google.colab import userdata
import os
# Set environment variables
api_key = userdata.get('GOOGLE_API_KEY_1')
os.environ['GOOGLE_API_KEY'] = api_key
#api_key = os.getenv("GEMINI_API_KEY")
llm = Gemini_LLM(api_key, "gemini-2.0-flash")
Cell 2: Define an evaluation function for LLaMEA¶
The function must accept a “solution” argument, which contains code, a name, etc.
You parse solution.code (the raw code), dynamically load it, and run it on your problem(s).
You then set_scores() to record how well it did.
We’ll define a simple example on the breast cancer dataset. We’ll ask the solution code to build a machine learning model that can predict the test set. We’ll then return a score based on the accuracy of the predictions.
[29]:
# Load the data set
X, y = load_breast_cancer(return_X_y=True)
(
X_train,
X_test,
y_train,
y_test,
) = train_test_split(X, y, random_state=1)
def evaluate(solution, explogger=None):
"""
Evaluates a solution on the breast cancer dataset.
"""
code = solution.code
algorithm_name = solution.name
exec(code, globals())
algorithm = None
# Final validation
algorithm = globals()[algorithm_name](X_train, y_train)
y_pred = algorithm(X_test)
score = accuracy_score(y_test, y_pred)
solution.set_scores(
score,
f"The algorithm {algorithm_name} scored {score:.3f} on accuracy (higher is better, 1.0 is the best).",
)
return solution
Cell 3 - define the instructions¶
Now we define the instructions that LLamEA will provide to the LLM. The instructions are split into the following parts:
task_prompt: the main task description with a general overview of the task.
example_prompt: one or more code examples to guide the search in the beginning.
output_format_prompt: how the LLM should generate the output.
[30]:
task_prompt = f"""
You are a highly skilled computer scientist in the field machine learning. Your task is to design novel machine learning pipelines for a given dataset and task.
The pipeline in this case should handle a breast cancer classification task. Your task is to write the Python code. The code should contain an `__init__(self, X, y)` function that trains a machine learning model and the function `def __call__(self, X)`, which should predict the samples in X and return the predictions.
The training data X has shape {X_train.shape} and y has shape {y_train.shape}.
"""
example_prompt = """
An example code structure is as follows:
```python
import numpy as np
import sklearn
class AlgorithmName:
"Template for a ML pipeline"
def __init__(self, X, y):
self.train(X, y)
def train(self, X, y):
# Standardize the feature data
scaler = sklearn.preprocessing.StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Let's create and train a logistic regression model
lr_model = sklearn.linear_model.LogisticRegression()
lr_model.fit(X_train, y_train)
self.model = lr_model
def __call__(self, X):
# predict using the trained model
return self.model.predict(X)
```
"""
output_format_prompt = """
Give an excellent and novel ML pipeline to solve this task (within a 120 second time-limit) and also give it a one-line description, describing the main idea. Give the response in the format:
# Description: <short-description>
# Code:
```python
<code>
```
"""
Cell 4: Create and run the LLaMEA search¶
Now just simply run LLaMEA and see the results.
[34]:
# We'll use a small number of iterations for demonstration
es = LLaMEA(
evaluate,
n_parents=1,
n_offspring=1,
llm=llm,
eval_timeout=120,
task_prompt=task_prompt,
example_prompt=example_prompt,
output_format_prompt=output_format_prompt,
experiment_name="AutoML-example",
elitism=True,
HPO=False,
budget=3,
)
best_solution = es.run()
print(f"Best found solution: {best_solution.name}, Score={best_solution.fitness:.4f}")
print(f"Generated code:\n{best_solution.code}")
print(f"Additional feedback: {best_solution.feedback}")
/usr/local/lib/python3.11/dist-packages/joblib/parallel.py:1383: UserWarning: The backend class 'SequentialBackend' does not support timeout. You have set 'timeout=135' in Parallel but the 'timeout' parameter will not be used.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/joblib/parallel.py:1383: UserWarning: The backend class 'SequentialBackend' does not support timeout. You have set 'timeout=135' in Parallel but the 'timeout' parameter will not be used.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/joblib/parallel.py:1383: UserWarning: The backend class 'SequentialBackend' does not support timeout. You have set 'timeout=135' in Parallel but the 'timeout' parameter will not be used.
warnings.warn(
Best found solution: StackingEnsemble, Score=0.9790
Generated code:
import numpy as np
import sklearn
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.pipeline import Pipeline
class StackingEnsemble:
"""
A stacking ensemble of Logistic Regression, SVM, and Decision Tree classifiers,
using cross-validation for robust performance.
"""
def __init__(self, X, y):
self.train(X, y)
def train(self, X, y):
# Define the base models
lr = LogisticRegression(solver='liblinear', random_state=42, C=0.1)
svm = SVC(probability=True, random_state=42, kernel='rbf', C=1) # Enable probability estimates
dt = DecisionTreeClassifier(random_state=42, max_depth=5)
# Create a voting classifier (soft voting)
estimators = [('lr', lr), ('svm', svm), ('dt', dt)]
self.ensemble = VotingClassifier(estimators=estimators, voting='soft')
# Create a pipeline with scaling and the ensemble
self.pipeline = Pipeline([
('scaler', StandardScaler()),
('ensemble', self.ensemble)
])
# Train the ensemble using cross-validation
self.pipeline.fit(X, y)
def __call__(self, X):
# Predict using the trained pipeline
return self.pipeline.predict(X)
Additional feedback: The algorithm StackingEnsemble scored 0.979 on accuracy (higher is better, 1.0 is the best).