This is based on udemy course recommender system

1. Hacker News Formula

Hacker News intro

Because the time has a larger exponent than the votes, an article’s score will eventually drop to zero, so nothing stays on the front page too long. This exponent is known as gravity.

But for efficiency, stories are individually reranked only occasionally. When a story is upvoted, it is reranked and moved up or down the list to its appropriate spot, leaving the other stories unchanged. Thus, the amount of reranking is significantly reduced.

There is, however, the possibility that a story stops getting votes and ends up stuck in a high position. To avoid this, every 30 seconds one of the top 50 stories is randomly selected and reranked. The consequence is that a story may be “wrongly” ranked for many minutes if it isn’t getting votes. In addition, pages can be cached for 90 seconds.The score for an article shoots up rapidly and then slowly drops over many hours.

The scoring formula accounts for much of this: an article getting a constant rate of votes will peak quickly and then gradually descend. But the observed peak is even faster - this is because articles tend to get a lot of votes in the first hour or two, and then the voting rate drops off.

Combining these two factors yields the steep curves shown.The green triangles and text show where “controversy” penalties were applied. The blue triangles and text show where articles were penalized into oblivion, dropping off the top 60. Milder penalties are not shown here.

2. Reddit Ranking

reddit algorithms intro

Use log: Idea of diminishing return.score is getting bigger as time passes by

3. Average Rating

Confidence: Normal, binomial,

Wilson Score Interval

Binomial proportion confidence interval

how to not sort by average rating

Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

Smoothing:

Explore-exploit dilemma

BayesianConjugate prior