Here is a common interview question regarding Moving Averages.

The problem statement: An analytics library receives lots of numbers, and can be queried to return the average of the latest N numbers. Implement a Stream function or class which is given a window size, and exposes methods to add a new number and query the moving average.

Example usage:

``````var stream = new Stream(3)
// Window size = 3

stream.add(12) // [51, 69, 36, 12]

stream.getAverage() // (69+36+12)/3 = 39

stream.add(60) // [51, 69, 36, 12, 60]
stream.add(9)  // [51, 69, 36, 12, 60, 9]

stream.getAverage() // (12+60+9)/3 = 27
``````

### The easy solution

When interviewing, I like to quickly scribble a strawman solution, an answer that has a lot of room for improvement and buys some time to think about the next iteration of solutions. It usually does not make use of clever data structures and may fail some edge cases, but at least you can tell the interviewer that you’re aware of the pitfalls. This is an important part of our work: to be able to identify gaps in code.

``````class Stream {
constructor (size) {
this.size = size
this.array = []
}
this.array.push(n)
}
getAverage () {
return this.array
.slice(-this.size)
.reduce((s, n) => s + n, 0) / this.size
}
}
``````

Pitfalls:

• It is slow. Slice is a bottleneck. If `getAverage` is executed several times, lots of rework.
• It uses unbounded memory. The array keeps growing and growing.
• It doesn’t show the interviewer that you know data structures aside from plain arrays.

(To avoid memory running away from us, we could trim the array like this)

``````  add (n) {
this.array.push(n)
if (this.array.length > 100) {
this.array.splice(0, this.array.length - this.size)
}
}
``````

(To avoid rework, we could cache the result and invalidate the cache like this)

``````class Stream {
constructor (size) {
this.size = size
this.array = []
this.average = null
}
this.array.push(n)
this.average = null
}
getAverage () {
if (this.average === null) {
this.average = this.array
.slice(-this.size)
.reduce((s, n) => s + n, 0) / this.size
}
return this.average
}
}
``````

### The usual solution

Here’s the “usual” solution which shows that you know what a queue is. If you encounter this question in an interview, here is usually the answer that is wanted. Its theoretical memory is `O(window size)`, but not in practice, since constant allocation and deallocation uses a lot more memory than you would expect.

``````class Stream {
constructor (size) {
this.size = size
this.array = Array(size).fill(0)
this.movingSum = 0
}
this.movingSum += n - this.array.shift()
this.array.push(n)
}
getAverage () {
return this.movingSum / this.size
}
}
``````

This is the state of the art in most languages which have a decently managed Queue in its standard library. But remember we are working with Javascript. Javascript has an Array which changes behavior when it grows.

### Let’s build a benchmark

These numbers are obtained with benchmark.js with microtime. The setup code is separated from the code we are measuring.

``````Benchmark({
setup: function () {
class Stream { ... }
var stream = new Stream(process.env.WINDOWSIZE)
var inputs = Array.apply(null, Array(1000))
.map(_ => Math.floor(Math.random() * 1000))
},
fn: function () {
inputs.forEach(n => {
stream.getAverage()
})
},
onComplete: function () {
console.log(this.stats)
}
})
``````

The results show that once the window size grows past 16 or so, the performance drops substantially:

``````Window size  Mean time
12   704 μs
13   676 μs
14   800 μs
15   769 μs
16  2515 μs
17  2595 μs
18  2598 μs
19  2544 μs
20  2735 μs
``````

The v8 engine performs optimizations for small arrays, and arrays which contain only integers. In some javascript implementations, there are huge performance penalties for pushing and shifting large arrays. The definition of “large” is implementation-specific. In node v7.0.0 which uses V8 v5.4, the benchmarks seem to indicate that performance degrades with arrays greater than length of 16.

### There’s a better way to solve this problem

What if we didn’t have to constantly enqueue and dequeue? These are the operations that take so much cpu time, and cause the array to be reallocated in memory, and keeps the garbage collector busy.

The solution here uses an array as before, but rather than enqueueing and dequeueing, it cycles through the elements in a round-robin fashion.

``````class Stream {
constructor (size) {
this.size = size
this.array = Array(size).fill(0)
this.index = 0
this.movingSum = 0
}
this.movingSum += n - this.array[this.index]
this.array[this.index] = n
this.index = (this.index + 1) % this.size
}
getAverage () {
return this.movingSum / this.size
}
}
``````

The benchmark results are now consistent. There is no memory reallocation happening, and the garbage collector has chilled out:

``````Window size  Mean time
12   603 μs
13   592 μs
14   597 μs
15   620 μs
16   636 μs
17   576 μs
18   645 μs
19   614 μs
20   583 μs
``````

### The Bieber Problem

There’s another important aspect to consider. Will this moving-average service be read-heavy or write-heavy? “The Bieber Problem”, named after pop star Justin Bieber who has nearly 90 million followers on Twitter, is frequently presented like this:

When the Biebs tweets, a few things happen:

2. 90 million people bots respond to this development (nobody reads those responses).

Describe the difficulties in handling this phenomenon, and describe how you would architect a system that is capable of handling it.

Let’s run some numbers for our two implementations with two usage profiles:

• “Write-heavy” is a benchmark that writes 1000 numbers then does one read.
• “Read-heavy is a benchmark that writes one number then does 1000 reads.
``````Window size    Write-heavy     Read-heavy
R-Robin  Easy  R-Robin  Easy
12      451   441      188  5548
13      471   418      189  5573
14      461   413      194  5571
15      467   416      187  5696
16      454   439      204  6207
17      459   414      191  7114
18      471   424      190  6420
19      453   417      198  6546
20      460   491      211  6741
``````

In the write-heavy profile, where a firehose of numbers is attached to our moving-average service, but the output is infrequently queried, then we don’t want to do so much upfront work. The “easy” solution actually does quite well!

In the read-heavy profile, where numbers come in slowly but a neurotic PM has a close eye on metrics so they can pull the A/B test after a few hours, then our round-robin array solution is best.

Back to “The Bieber Problem”, how do we decide on an implementation that is best for both write-heavy and read-heavy scenarios? This contrived moving-average service is lucky that it doesn’t really do much. Think about Twitter’s scale problems for a few minutes.

• Write a `getMaximum` method, which returns the largest number from the window. For example, with a window size of `5` and inputs `[4, 5, 8, 2, 3, 5, 7, 1]`, the latest window is `[2, 3, 5, 7, 1]` and the maximum there is `7`. Think about performance in both the write-heavy and read-heavy profiles. (Hint)
• Write a `getWeightedAverage` method. This is commonly used in stock markets, where today’s values are very important, yesterday’s values are somewhat important, and last week’s values are of little importance. Example calculation: Given the most recent values of `[12, 16, 14, 17, 9]`, the weighted average could be `(1*12 + 2*16 + 3*14 + 4*17 + 5*9) / (1 + 2 + 3 + 4 + 5)`.
• Explore Typed Arrays. By hinting the engine that your array is a fixed size and will only contain integers, the engine can perform some very beneficial optimizations. `DataView`s can be used to quickly select a subset of your array.
• Use RxJS so that the Stream exposes a `push` method rather than `add`, and other listeners can subscribe to updates.
• Use ReadableStreams. `ReadableStream` is in early draft but has a reference implementation that can be used as a polyfill. I haven’t ventured into this territory yet, but it sounds promising.