Partition Trees model the conditional distribution\(p(y \mid x)\) as a piecewise-constant density over the outcome space. Depending on the dataset and hyperparameters, this can yield not only a better probabilitistic classification but also improved accuracy.
Setup
We use the UCI letter recognition dataset — a well-known classification benchmark with 26 classes. We initially use a simple train-test split for clarity, but the same code works with cross-validation and pipelines.
from ucimlrepo import fetch_ucirepofrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import load_digitsfrom sklearn.metrics import accuracy_score, log_lossimport pandas as pd# fetch datasetletter_recognition = fetch_ucirepo(id=59)# data (as pandas dataframes)X = letter_recognition.data.featuresy = letter_recognition.data.targetsX, y = load_digits(return_X_y=True) # --- IGNORE ---X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42)X_train.shape, X_test.shape
/home/runner/work/partition_tree/partition_tree/partition_tree/src/partition_tree/sklearn/partition_tree.py:27: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
y_dtype = y_df.dtypes[0]
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
/home/runner/work/partition_tree/partition_tree/partition_tree/src/partition_tree/sklearn/partition_tree.py:27: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
y_dtype = y_df.dtypes[0]
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.