model

use a classifier to detect differences

source

prep_data

 prep_data (f1='file_a.jsonl', f2='file_b.jsonl')
_df = prep_data()
assert len(_df) == 4568
Loaded 2284 rows from file_a.jsonl
Loaded 2284 rows from file_b.jsonl

source

model

 model (df)

Fit a model and calculate diagnostics.

clf = model(_df)
clf.roc_auc
0.9652849641638879
clf.top_features.head(15)
Feature Importance
0 <|END-UI-FORMAT|> Role 0.071926
1 <|UI-FORMAT|> id 0.051172
2 Role function <|JSON-FORMAT|> 0.051035
3 <|END-UI-FORMAT|> Role assistant 0.050680
4 <|UI-FORMAT|> 0.050151
5 <|END-JSON-FORMAT|> Role assistant 0.048927
6 <|END-JSON-FORMAT|> Role 0.047124
7 <|JSON-FORMAT|> 0.046406
8 ```json id 0.042374
9 assistant ```json 0.039353
10 function <|UI-FORMAT|> 0.037131
11 function <|JSON-FORMAT|> id 0.028249
12 <|JSON-FORMAT|> id 0.028228
13 <|END-JSON-FORMAT|> 0.027623
14 <|END-UI-FORMAT|> 0.026620