Computer Science Professor Sharad Mehrotra receives $725,000 in NSF funding to head research projects in big data and disaster response cyber physical systems.
Sharad Mehrotra, computer science professor and vice chair of graduate studies, has recently received close to $725,000 from the National Science Foundation (NSF) to fund two separate projects: one on disaster response cyber physical systems and one on big data.
Big data project
In August, Mehrotra received nearly $500,000 for his project “Linking and Resolving Entities in Big Data,” which will explore the challenge of cleaning data in the context of analysis pipelines over big data. Mehrotra’s research comes amid changing back-end approaches to data cleaning, or the process of detecting and correcting or removing corrupt and inaccurate data sets. As the project abstract explains, “Data cleaning has traditionally been designed to improve data quality in ETL (Extract-Transform-Load) systems where enterprise data is collected, prepared, staged, transformed and loaded into a data warehouse to support offline data analysis. In the era of big data, such back-end processes are quickly giving way to interactive exploratory data analysis where analysts immerse themselves in data (possibly collected from heterogeneous data sources) in order to drive online (near-) real-time decision making.”
Current systems are not able to scale to the volume, velocity or the variability of dynamically generated data like that of social media platforms, but the challenge extends beyond scaling up. In fact, there needs to be a paradigm shift in the data cleaning approach, Mehrotra’s research posits.
The project is part of the SHERLOCK Project headed by Mehrotra. The SHERLOCK team researches entity resolution—the linking and grouping of multiple manifestations of real world entities from various sources—and data quality.
Serving as principal investigator for the project, Mehrotra will oversee the research that will explore two innovations to further enable big data analysis, including an approach to entity resolution that supports progressive analysis and a second, conceptually complex, innovation: “the analysis-aware data cleaning that is developed for structured queries (e.g., Hive and SQL) for both one-time and continuous query scenarios that are issued on top of static and streaming data.”
The grant comes from the NSF Division of Information and Intelligent Systems (IIS), under the Information Integration and Informatics (III) core program, which “focuses on the processes and technologies involved in creating, managing, visualizing and understanding diverse digital content in circumstances ranging from individuals through groups, organizations and societies, as well as from individual devices to globally-distributed systems, according to its website.
According to Mehrotra, data cleaning techniques, traditionally designed to improve quality of data in back-end data warehouses, are fast emerging as a vital component of real-time information access. As the web evolves toward supporting interactive analytics and social media begins to play a more dominant role in organizational critical decision-making, the need for “on-the-fly” cleaning to help alleviate data quality challenges is rapidly increasing. Scaling data quality techniques to big and fast data is not just a matter of throwing more hardware at it—it will require algorithmic innovations and new ways to approaching the cleaning challenge. “This project will help us explore two such innovations: progressive cleaning, wherein data is cleaned incrementally and the cleaning can stop when the required data quality is achieved, and in incorporating “intelligence” in cleaning, wherein only data relevant to the task at hand is cleaned,” says Mehrotra. “The project, if successful, can help launch a new paradigm for data cleaning technologies suited for interactive exploration and decision making.”
Disaster response cyber physical systems project
In September, NSF also awarded Mehrotra nearly $225,000 for another of his projects, “Extracting Time-Critical Situational Awareness from Resource Constrained Networks.” The project is a research collaboration between UC Irvine and UC Riverside that aims to solve an important, interdisciplinary cyber physical system problem. Its goal being “to facilitate timely retrieval of dynamic situational awareness information from field-deployed nodes by an operational center in resource-constrained uncertain environments, such as those encountered in disaster recovery or search-and-rescue missions,” according to the project abstract.
Current technology allows for the deployment of field nodes that can return content like video and images, but there are significant interdisciplinary challenges when it comes to acquisition, processing and extraction of relevant content, particularly under resource constraints. The project, which Mehrotra will serve as principal investigator for the UCI portion, will develop a set of algorithms and protocols to accomplish three things:
- Intelligently activate field sensors and acquire and process the data to extract semantically relevant information;
- Formulate expressive and effective queries that enable the near-real-time retrieval of relevant situational awareness information while adhering to resource constraints; and
- Impose a network structure that facilitates cost-effective query propagation and response retrieval.
The research involves multiple facets of computer science, including computer vision, data mining, databases and networking, and understanding the scientific principles behind information management with compromised computation/communication resources.
With more than 1 million deaths and $1.5 trillion in damage from disasters within the past decade, according to the 2015 World Disasters Report, this project has the potential to significantly curb dire consequences and improve disaster responses. The project will also emphasize broader community engagement and partner with programs that target underrepresented students.
“This project will collaborate with firefighters and other emergency response personnel in the Southern California area and is based on the preliminary research conducted by one of my female Ph.D. students who received a prestigious best paper award at the Association for Computing Machinery (ACM) International Conference on Multimedia Retrieval in 2013,” says Mehrotra.