Current Projects

OYSTER – Open SYstem for Entity Resolution
Sponsor: ERIQ Research Center
Project Leader: John R. Talburt
An ongoing project, OYSTER has been the primary ER research platform for the ERIQ Center since 2009. Originally developed as a demonstration tool for teaching the principles of ER, it has grown to be a useful and widely adopted open source system. Originally posted on SourceForge.net as project “OysterER”, it has more than 7,600 downloads. OYSTER has now moved to BitBucket.net under the same project name “OysterER.”

PiLog Master Data Management Research and Development
Sponsor: PiLog Group
Project Leader: John Talburt; Team Members: Leon Claassens, Purvi Parmar
Research and development to adapt and migrate existing relational database MDM tools into the distributed computing environment.

OFFER – An Open Framework for Entity Resolution
Sponsor: ERIQ Research Center
Project Leader: James True

OFFER is a project to re-architect the OYSTER Open Source Entity Resolution into the Java Spring Framework.

Positive Data Control
Sponsor: ERIQ Research Center
Project Leader: John Talburt
A project to synchronize datasets and data catalog entries in the Big Data (HDFS) distributed processing environment. Initial use cases use the newly released Apache Atlas open source data catalog.

Application of Machine Learning to ER
Sponsor: ERIQ Research Center
Project Leader: Xinming Li; Team Members: Talha Tayyab, Xiangwen Liu, Ting Li
A project to explore the use of machine learning and other AI techniques for ER.

OYSTER DIVER – Entity Resolution in the Distributed Computing Environment
Sponsor: ERIQ Research Center
Project Leader: Purvi Parmar; Team Members: Huzaifa Syed
A project to redesign the server version of OYSTER to take advantage of the scalability of the distributed computing environments, with initial focus on the Hadoop Distributed File System (HDFS) based on map/reduce processing.

Tools, Documentation, and Testing
Sponsor: ERIQ Research Center
Project Leader: Kris Anderson
Creating and updating documentation and test sets, and developing tools for OYSTER users and the OYSTER research teams

ERIQ Website Maintenance
Sponsor: ERIQ Research Center
Project Leader: Purvi Parmar
Creating and update content of ERIQ website as needed

Past Projects

VENSIM
Sponsor: ERIQ Research Center
Project Leader: Yumeng Ye; Team Members: Purviben Parmar
Investigating the use of VENSIM to generate data for our research

Matrix Comparator for Unstructured and Semi-Structured Characteristic Data
Sponsor: ERIQ Research Center
Project Leader: Awaad Alsarkhi; Team Members: Xinming Li, Akhila Thirumalareddy
A project experimenting with new methods and techniques for comparing and linking unstructured and heterogenously structured references. Current work is focusing on the use of the Matrix comparator.

Probabilistic Matching
Sponsor: ERIQ Research Center
Project Leader: Sapna Srimal; Team Members: Bingyi Zhong, Yumeng Ye
A project experimenting with techniques for finding the optimal parameters and logic for record linking using probabilistic matching techniques.

Comparing the Effectiveness of Deterministic Matching with Probabilistic Matching for Student Enrollment ER
Sponsor: Arkansas Department of Education (ADE)
Principal Investigator:  John R. Talburt
June 2009 – March 2014

Clinical and Translation Science Initiative
Sponsor: National Institute of Health (NIH) through the University of Arkansas for Medical Sciences (UAMS)
Principal Investigator:  John R. Talburt
June 2009 – March 2014
UAMS has an undertaken an initiative to enhance the quality and integration of research data, to provide tools for collaboration, and to develop information quality training for researchers, staff and students.  As part of this, UA Little Rock and UAMS have agreed three research and development efforts that support the overall initiative. The first has to do with enhancing the quality of research data warehouse.  The second is providing tools for collaboration and community engagement to support translational research.  Research and investigation will be done into the different collaboration and community communication tools including wikis, web logs, discussion forums, groupware, screen sharing, and content management.  The third is the development and implementation of Information Quality Training Classes.  The training will focus on three groups:  students enrolled in Bioinformatics graduate programs, students enrolled in graduate programs related to translational research, and faculty and staff involved in translational research.

Referent Tracking in Health Care
Sponsor: University of Arkansas for Medical Sciences (UAMS)
Principal Investigator:  John Talburt
July 2012 – June 2013
A collaborative project intended to explore the integration of Referent Tracking (RT) software developed by faculty and staff at UAMS and the Open sYSTem for Entity Resolution (OYSTER) software developed by faculty, staff, and students at the University of Arkansas at Little Rock.  Referent Tracking seeks to assign an instance unique identifier (IUI) to each individual person, process, disease, prescription, fracture, tumor, etc.  Duplicate assignment of IUIs is therefore highly problematic, as is the assignment of one IUI to two different individual entities.  OYSTER is software that can detect such errors in IUI assignment.  Furthermore, OYSTER has a mode of operation that relies on tracking unique individual persons and addresses over time.  It can thus be extended through Referent Tracking to track, and de-duplicate IUIs for diseases, prescriptions, tumors, fractures, and other types of entities.

Towards High-Quality of Identity Attributes
Sponsor: Arkansas Department of Education (ADE)
Principal Investigator:  Ningning Wu, Co-Principal Investigator: John R. Talburt
July 2012 – June 2013
The primary goal of this research is to investigate how the quality of identity attributes will impact the quality of entity resolution of Arkansas K-12 student records.  It will first study the data quality of identity attributes to identify the key quality problems, then evaluate how quality of the identity attributes relates to the quality of entity resolution (ER) results from the aspect of false negatives. It will attempt to rand identity attributes according to their impact on ER results as a means prioritize the data quality improvement efforts. The results of this study will enable ADE to communication more efficiently with schools and raise their awareness of data quality as well as to help them improve the data quality in data collection and submission processes. The results of the study will also provide guidance for refining the Cycle Validation/Data Accuracy Reports and certification process of Statewide Information Systems (SIS) so that they can be more effective in enforcing the data quality standards across all schools.

Information Quality Tools for Persistent Surveillance Data Sets.
Sponsor: US Air Force Research Laboratory, Sensor Directorate, Wright Patterson Air Force Base, Dayton, Ohio.
Principal Investigator:  John Talburt; Co-Principal Investigators: Serhan Dagtas, Mariofanna Milanova, Mihail E. Tudoreanu
June 2009 – October 2011
The Air Force desires a comprehensive vehicle to identify and address requirements for information quality tools and techniques that will support defensive and offensive operations research in the layered sensing domain.  As use of remote sensors in the Air and Space domains increases, the value of the sensor datasets must be maximized and assurances established that the product outcomes meet the application requirements.  As multiple sensors are combined into layered sensing systems, this increases the need to understand not only the quality and fitness for using the individual sensor data streams, but also how to assess the quality and value of the aggregate data. The scope of this task order is to develop metrics that assess the quality and effectiveness of persistent surveillance data sets.  The project also explores the use of 3-D visualization in rendering layered data sets and experiments with the integration of textual information.  In addition, integrating processing of data available from multiple types of sensors (such as in a Smart Environment) has been explored, and experiments have been done to support data fusion for multiple sensors.

Proof-of-Concept for an Open-System Entity Resolution Engine to Support Longitudinal Studies in Education
Sponsor: Arkansas Department of Education
Principal Investigator: John Talburt; Co-Principal Investigator: Ningning Wu
June 2009 – May 2010
The TRUSTed Project is a research project to investigate the design for an open-system, entity resolution engine for the Arkansas Department of Education that will support longitudinal (multi-year) studies of student performance in Arkansas schools.  A prototype of the design will be implemented using open system tools and standards.

Delta Center for Identity Solutions
A collaboration with Arkansas State University Center for the Study of Automatic Identification
Sponsor: Arkansas Science and Technology Authority
Investigators: John Talburt, Farhad Moeeni (Arkansas State University), Dale Thompson (University of Arkansas, Fayetteville)
Research related to both identity management and identity information management. Identity management refers to the accurate, real-time, secure and tamper-proof identification and authorization of people for physical access to facilities, logical access to computer networks and for performance of various transactions. Identity information management entails measuring and improving information quality, detecting fraudulent behavior, and understanding data privacy and protection issues.

A Semiotic Approach to Layout Inference and Data Transformation
Sponsor: Acxiom Laboratory for Applied Research
Investigators: John Talburt, Ningning Wu, Chia-Chu Chiang
An investigation into the application of the principles of semiotics (intention, syntax, and semantics) to the problem of locating and identifying information in datasets of unknown layout through syntactic and semantic profiling. It also envisions the reuse of profiling information to the problem of developing a declarative language to describe and implement data transformations.

Methods and Techniques for Entity Identification in Open Source Documents with Partially Redacted Attributes
Sponsor: National Science Foundation
Investigators: John Talburt, Chia-Chu Chiang, Ningning Wu, Richard Wang (MIT)
An investigation into the degree to which partial identity information (“identity fragments”) can be resolved within a large reference set of known identities based on a variety of collateral evidence including quality of match, age consistency, and family member co-location.