Research highlight

Supporting Fuzzy Search

photo::Chen Li
Chen Li

In information systems, very often users are frustrated by not being able to find the desired entities (such as records, documents, and emails).

One of the common reasons is the discrepancy between the user query and the representations of the entities in the information repository, due to users' limited knowledge about the entities, careless typos, or errors in the entity representations.

As an example, how many of us can correctly spell the last name of the governor of California?

What if we want to find a person or restaurant the name of which we remember roughly?

Computer Science Professor Chen Li's research team has been studying how to make information access easier by supporting fuzzy search, even interactively.

The developed techniques can help users find information that match a query approximately (such as "smith" versus "smyth"), even if they do not match exactly.

For example, if a user searches for people by typing in a name "bill crop", fuzzy search can also find similar names such as "william kropp".

The techniques are very important to applications such as people search, product search in e-commerce Web sites, and entity extraction.

To demonstrate the power of the research results, the team has deployed two prototypes. The first one is available at http://psearch.ics.uci.edu, which supports interactive, fuzzy on a UCI directory provided by NACS.

It has a single input search box, which allows keyword queries on people name, UCInetID, telephone number, department, and title. It has the following features:

  • Supports interactive search: search as you type;
  • Allows minor errors in the keywords;
  • Supports synonyms, e.g., "William = Bill", and "NACS = Network and Academic Computing Services".
  • Allows multiple keywords;

The second search system has been launched on the ICS school home page which supports interactive search on ICS people as well as important pages in the ICS domain.

Simple searchees such as "course" and "venkta" can demonstrate the features of the system.

Professor Chen Li's research is supported by grants from the National Science Foundation (NSF), gift funds from Google and Microsoft, and a fund from Calit2.