I love Open Source software. From the beginning of my Ph.D. I have been using it and I guess was an “early adopter”.
I even open-sourced the code for my Ph.D. thesis, well before it was fashionable to do so. My only (current) frustration is the server that I put it on was de-commissioned some time ago and I haven’t been able to source a copy (as yet!) Research should be reproducible and if there are errors in what I did, I would like others to be able to aware of this.
Data science lives and breathes on open source software. However, how do we get comfort around the validity of the open source code. In my opinion, being open source makes it more likely that errors are discovered “early and often” compared to commercial software. However, that doesn’t mean it is free of material errors.
So how do we validate the open source software that we use.
Increasingly there are platforms like https://snyk.io/advisor and Tidelift. TODO: Complete post.
Also another one on validation of AI (e.g. machine learning algorithms) and domains where it matters and others (maybe marketing?) when it is not so important.
e.g. https://www.dominodatalab.com/model-risk-management/
Perhaps some comments about attention to detail and references to operational risk & data science — perhaps a separate blog post on this.
Another potential article Data Science in Operational Risk e.g. Fraud detection
e.g. http://www.garp.org/#!/risk-intelligence/technology/data/a1Z40000003PMBtEAO