Turning Data into Computation Power

06/11/2015

For the last several years, the emergence of datasets has grown at an unprecedented scale across numerous different scientific disciplines. The large volume of such datasets presents new computational challenges as the diverse, feature-rich, and usually high-resolution data does not allow for effective, data-intensive inference.

From a practical point of view, the increase in the data size is a source of computational complexity which typically translates into higher running time of algorithms. From this perspective, large data is considered a nuisance rather than a resource for achieving more accurate results.

In an award-winning paper from the international conference on Artificial Intelligence and Statistics (AISTATS) entitled "Tradeoffs for Space, Time, Data and Risk in Unsupervised Learning," Amin Karbasi, assistant professor of electrical engineering and computer science, and co-authors showed how we can use one of the four fundamental resources in data analytics (namely, space, time, data, and risk) and turn it into another resource. For example, the researchers demonstrated theoretically and empirically that more data could be transformed into faster algorithms. Karbasi's approach is based on novel computational geometric techniques, called coresets, where a small amount of the most relevant data is extracted from the dataset, while performing the computation on this extracted data guarantees a good solution to the original problem.