## Posts

Freely available online resources that I’ve found useful for learning Statistics, Machine Learning, Distributed Computing, Database Systems, and other CS and SWE topics.
Statistics and Machine Learning ML Expositions on machine learning that are freely accessible (like the ones on Coursera) tend to sweep theoretical foundation and mathematical rigor under the carpet; these are resources that don’t skimp on the hard stuff.
Stanford CS229 Machine Learning
One-line summary: the course teaches you how to set up a cost function based on a model and data, and figure out how to optimize it.

Read more
This is the final part of the tutorial. We furnish the app with an UI, and deploy it to Heroku.
5. Build UI 5.1 index.html and main.js We first create a dropdown list. The selected category is stored in selected, and it is posted to /start when submit is called. Afterwards, the component polls /results.
main.js - categoryDropdown
var categoryDropdown = new Vue({ el: '#category-dropdown', data: { selected: '' }, methods: { submit: function() { keywordsResult.

Read more
In Part 1, we developed a keyword extraction algorithm. The next step is to modify the algorithm to use database. Configuring Postgres is more involved than in Flask by Example, since we need models to store article data. The following diagram shows what our finished product will look like.
We use the end product of Flask by Example tutorial as a boilerplate. Complete Part 1-4 of Flask by Example, or clone the repo of and configure Postgres by following these steps:

Read more
This is a tutorial on web development written for people with a statistical analysis, scientific computing, or machine learning background. We start with an algorithm using data that fits comfortably into memory, and modify it to accept a large input. We then set up an infrastructure to serve the resulting algorithm. This tutorial focuses on the infrastructure rather than the algorithm, which will remain rudimentary. The end product is a Heroku deployment of a text summarization algorithm that analyzes articles on arXiv to extract keywords from each research category within mathematics.

Read more
Introduction The model we consider is \(Y_i = \alpha + \beta x_i + \epsilon_i\), where \( \epsilon_i \) are uncorrelated, and \( \mathbb{V}(\epsilon_i) \) depends on \( i \). We discuss two solutions to finding estimators of \( \alpha, \beta \). Weighted least squares regression leads to best linear unbiased estimators (BLUE). Also, with stronger assumptions on \( \epsilon_i \), maximum likelihood estimators (MLE) can be found. We begin with a discussion of the homoskedastic case with an emphasis on relations between statistical properties of the least squares estimators and assumptions on \( \epsilon_i \), which is conducive to understanding the heteroskedastic case.

Read more
I ran linear classifiers on a credit card fraud data. Parallelization. Lasso and ridge. Grid search. Published on kaggle.