In [ ]:

import R_audience as data_enthusiasts

“My model shows…”¶

communicating data science to the management¶

Ágoston Török, Synetiq¶

@torokagoston @SynetiqLab

- I’ve got so many things to say¶

- I’ve got so little time to listen¶

Prevent accidents with data science¶

Solution¶

One Class SVM
Finds the most optimal hyperplane that differentiates in-lier and outlier cases
Prediction based on Support Vectors (sparsity)

$k(x,G)=\exp\left(-\frac{\|x-G\|^{2}}{\sigma^{2}}\right)$

With regard to prediction time, SVR is faster than KernelRidge for all sizes of the training set because of the learned sparse solution. Note that the degree of sparsity and thus the prediction time depends on the parameters \epsilon and C of the SVR; \epsilon = 0 would correspond to a dense model.

Mathematically the number of support vectors linearly increase with training examples in the Gaussian problem, the sparsity should be scaled also if we want to use the same number of SV-s from that point

Isolation Forest is a new feature in scikit learn, SVM relies on sparsity by default, but often IsolationForest outperforms SVM on larger datasets

e.g. based on the current usage numbers a 4,7% false positive rate in fraud detection means we will incorrectly disable the account of 5 users every week who will need to be handled by customer support and could potentially leave us for this. However we will also correctly avoid 20 account hijacks that customer support will no longer need to work with, so altogether they will have less work and save money there.

“My model shows…”¶

communicating data science to the management¶

Ágoston Török, Synetiq¶

- I’ve got so many things to say¶

- I’ve got so little time to listen¶

Three stories¶

Prevent accidents with data science¶

The problem¶

Solution¶

Conclusion¶

Make things automatic with data science¶

The problem¶

Conclusion¶

Find the tune in the noise¶

Problem¶

Conclusion¶

Take home messages¶

Thank you for your attention!¶