CS Graduate Student Presents Solo-authored Paper at Top Theory Conference

Streaming algorithms process a sequence of high-speed inputs and have limited memory space with which to do so. A commonly cited example is the detection of heavy hitters in networks. Popular websites receive millions of hits. Hidden among the visitors are bots that often use the same IP address to access the website with unusually high frequencies. Webmasters need to devise efficient ways to detect and block them without having to store and sort through a list of all IP addresses.

In a recent paper presented at the ACM-SIAM Symposium on Discrete Algorithms in January, Manuel Stoeckl, Guarini '24, compared several recently introduced models for streaming algorithms.

"It is a considerable achievement for a student to have a solo-authored paper at a top conference such as SODA," says Stoeckl's PhD advisor, Amit Chakrabarti.

It has long been known that streaming algorithms which are allowed to make random decisions can outperform deterministic streaming algorithms, says Stoeckl. However, typically streaming algorithms were studied under the assumption that their input is fixed; that is, every element in the input stream is chosen before the algorithm starts to process the stream.

Because streaming algorithms are sometimes applied to input data that is being generated in real time, and the source of the data is adjusted based on properties of the data seen so far, the data produced in the future may depend on the output from the streaming algorithm, he says.

With a randomized streaming algorithm, this can correlate the future input with the algorithm's random choices. To prove that algorithms give correct outputs in this scenario, a new analysis is needed to account for this correlation. A recently popular framework is to design streaming algorithms which work even in the "adversarial setting"—when the input is produced by an adversary that looks at the output of the algorithm so far and picks the worst possible next input.

In his paper, Stoeckl studies a simple streaming task for which the minimum space needed to solve the problem depends significantly on the model in which it is studied. In this task, known as Missing Item Finding, one receives a sequence of numbers from a set and must output a number that has not been seen in the sequence so far. Depending on the size of the set and length of the sequence, this task can be solved while using significantly less space than is needed to remember the sequence of numbers.

The paper's contribution is to find the approximate space requirements for Missing Item Finding in a number of models. Stoeckl shows that randomized algorithms, when the input is independent of the algorithm's output, can solve this problem very efficiently. On the other hand, in the adversarial setting, significantly more space is needed, and if the algorithm must be deterministic, even more. Yet, Stoeckl finds, even with deterministic algorithms, one can do noticeably better than just remembering the input.

Stoeckl also raises a question about how the type of randomness available to the algorithm affects space usage in the adversarial setting – does it matter whether the one can look at a list of random numbers before it receives any input, or whether one can ask for new random numbers on demand, or whether one may at any time look at a large list of random numbers, without having to remember any of them? Stoeckl rules out specific answers in the paper, but the question remains open.