Articles on ML and related math

Henry Bigelow CV

Articles

Entropy Scaling The Mutual Information of Quantization vs. Noise How minimizing KL-divergence maximizes Mutual Information Jensen's Inequality Predictive Coding + Autoencoder Kernel Regression

About me

Computational Biologist with 7 years industry experience developing and applying machine learning models to genome-scale problems in drug target discovery. Software engineer with experience as Lead Software Architect at SaaS startup developing cloud-based, multi-user collaboration tool. Machine learning researcher using TensorFlow and Pytorch interested in autoregressive models, autoencoders, information theoretic training, predictive coding.

I believe the potential of Machine Learning, even as it exists today, has largely been untapped in industry due to difficulties in development, deployment, and simply because advances have been so recent, the applications are still being discovered. In my experience in biotech, there was no shortage of rich datasets, but we lacked the flexibility in deploying large compute, a braintrust of in-house software engineers and data scientists, and slow procedures to clear access to patient data.

The result was that many rich genomics data sets would go unprocessed or only superficially analyzed. Data was siloed in different systems, which made aggregate analysis difficult. Security and privacy concerns prevented us from freely using cloud services, and finding compute that was appropriately colocated with the large (hundreds of TB scale) data was difficult.

My hope is that the ML and Biotech communities gradually increase their mutual presence in each others' fields. It is impossible to effectively apply modern ML techniques without ML expertise. Likewise, it will be quite difficult for ML experts with little biological domain knowledge to effectively apply techniques to the data. Finally, the specific datasets in need of analysis, and the particular details, are most often proprietary and confidential, and could only be seen by ML researchers in-house or in a well-established collaboration.

For all these reasons, the true potential of Machine Learning in biotech has largely been untapped.