Mergeable Summaries and the Data Sketches Library

Edo Liberty, principal scientist at AWS and head of Amazon AI Labs, USA.

Mergeable summaries (formalized by Agarwal et al. in PODS 2012) allow one to process many different streams of data independently, and then the summaries computed from each stream can be quickly combined to obtain an accurate summary of various combinations of the datasets (union, intersection, etc.). Among other major benefits, mergeable summaries allow data to be automatically processed in a fully distributed and parallel manner, by partitioning the data arbitrarily across many machines, summarizing each partition, and seamlessly combining the results.

This talk will describe a line of research that has grown out of the development of Data Sketches, an open source library of production-quality implementations of mergeable summaries for basic problems including unique counts, quantiles, frequent items, sampling, and matrix analysis. The library is currently used by several companies and government agencies (Yahoo/Oath, Amazon, Splice Machine, GCHQ, etc.) and enables real-time processing of massive datasets.

Bio: Edo Liberty is a Principal Scientist at AWS and the head of Amazon AI Labs. Amazon AI Labs is a unique organization of world class scientists and prodigious engineers. Together, they build the core algorithmic and machine learning components for many AWS services, including SageMaker. Before joining Amazon, Edo led Yahoo Research in New York and Yahoo’s Scalable Machine Learning group. He was a postdoctoral fellow in Applied Mathematics at Yale where he also received his PhD in Computer Science. His research interests include dimensionality reduction, clustering, optimization, streaming and online algorithms, machine learning, and large scale numerical linear algebra. He is the author of more than thirty academic papers on these topics including award winning works on streaming matrix approximation and fast random projections. Edo is frequent keynote speaker, tutorial presenter, and committee member at international conferences.

Network Research and Falling Trees

Fabián Bustamante, professor at Northwestern University, USA.

Whenever we make the case for the relevance of our field we point to the societal impact of the Internet - that research experiment that escaped the lab to become the global communication infrastructure, critical to nearly every part of modern society. There seem to be an endless pool of technology innovation around and over this network and a equally infinite wealth of challenging research problems. The research agendas we build from them, however, seem at times light years from having any societal impact. In this talk, I will discuss the promises and perils of crafting a networked systems research agenda focused on end users, from the opportunity to run a research program without having all the right “connections” to the constant struggle for turning user problems into interesting research questions your community would appreciate.

Bio: Fabián E. Bustamante (PhD/MS Georgia Tech '97/01) is a Professor of Computer Science at Northwestern University. His research interests span several areas of networking and distributed systems, with a focus on characterizing networked systems from the perspective of end users and designing new and improve systems based on the gain insights. As part of their work, he and his research group have released tens of open-source systems that together have gained over 1.5 million users worldwide. Fabián currently serve as Lead Scientist at Phenix Inc., a Chicago-based startup focused on scalable, real-time broadcasting. He is an ACM and IEEE senior member, a recipient of the National Science Foundation CAREER award and the Science Foundation of Ireland E.T.S. Walton Visitor Award.

Seeing Things: Measuring IoT, IPv6, and Privacy

David Plonka, senior research scientist at Akamai Technologies, USA.

In this talk, we'll consider challenges and approaches to measurements in three key areas: the Internet of Things, Internet Protocol version 6, and end-user privacy. I'll share results and thoughts on how and where new approaches help us understand these critical, yet often unmeasured, aspects of the Internet today.

Bio: David Plonka (Dave) is Sr. Research Scientist at Akamai Technologies. Dave's research and development interests include Internet measurement, traffic classification, analytics, and anomaly detection; Internet Protocol version 6 (IPv6); global Internet application performance; managing the IoT - Internet of Things; and security vulnerabilities. He has worked as a programmer and network engineer, authored free software, and developed Internet best practices and standards. He holds a Ph.D. (Computer Sciences) from the University of Wisconsin-Madison.