Model Training
This script handles the entire training pipeline for detecting wash trades using an XGBoost classifier. It combines real trade data from Bitquery, rule-based labeling, feature preprocessing, model training, evaluation, and model serialization.
Code Breakdown
Imports
import pandas as pd
from get_data import get_trades
from label import label_trades
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import pickle
import json
Data Processing
The code snippet given below fetch live DEX trade data from Bitquery, apply rule-based labeling and modify the labelled dataframe to prepare datasets for training the model. Here:
X
= All the features.Y
= Binary label indicating whether the trade is suspicious.
Finally the model features are stored in a JSON
list for consistent preprocessing during inference.
trade_data = get_trades()
df = label_trades(trade_data)
for col in df.columns:
if df[col].dtype == 'object' and col != 'is_wash_trade':
df[col] = df[col].astype('category')
X = df.drop(columns=["is_wash_trade"])
y = df["is_wash_trade"]
with open("model_features.json", "w") as f:
json.dump(X.columns.tolist(), f)
Split Dataset for Training and Testing
This splits the data into 80%
for training and 20%
for testing, using a fixed random seed for reproducibility.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
Train XGBoost Classifier
The model is trained on Bitquery DEX trades data in the code given below.
model = XGBClassifier(enable_categorical=True, tree_method='hist')
model.fit(X_train, y_train)
Notes:
enable_categorical
=True allows XGBoost to natively handle categorical features.tree_method
='hist' improves training speed.
Save the Trained Model
The trained model is stored in a pickle file with .pkl
extension and will be later loaded in app.py for inference.
with open("xgb_wash_model.pkl", "wb") as f:
pickle.dump(model, f)
Evaluate Model
The code snippet below, prints performance metrics like precision, recall, F1-score for both wash and non-wash trades.
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))