NSFC Cooperative Project


Data Quality Analysis and Approximate Query Processing for String Collections from Multiple Sources

Member
  • Professor Chen Li at University of California, Irvine (UCI), USA.
  • Professor Xiaochun Yang at Northeastern University, CHINA.
Period
        2009.1 -- 2010.12
Objective
        The volume of information and knowledge is increasing at an incredible pace. Meanwhile, data from heterogeneous sources can have inconsistencies, uncertainty, or errors. Such data could cause serious problems to many data-intensive applications. Text data is one of the most important formats, and supporting approximate queries on text data is an important problem for these applications. This project focuses on research topics related to managing inconsistencies of text data. We will study lineage of text data from multiple sources, and efficient algorithms for answering approximate queries on large text data sets. We will develop novel indexing structures and efficient algorithms for query processing and optimization. We will evaluate our proposed techniques on real data sets, and make contributions to both theories and real applications.
Acknowledgements
        This release is partially supported by the National Science Foundation of China under Grant (No. 60828004).