PiLog Master Data Management Research and Development
Sponsor: PiLog Group
Project Leader: John Talburt; Team Members: Leon Claassens, Purvi Parmar
Research and development to apply gerative AI models, both text and visual, to the problems of industrial materials classification.
Positive Data Control
Sponsor: ERIQ Research Center
Project Leader: John Talburt, Team Members: Lou Foua, Muzakkiruddin Mohammed, Shames Al Mandalawi
A project to move Data Governance and Data Security to the next level by encapsulating data systems in a wrapper with all access and operations controlled by a policy-aware LLM. PDC systems can securely exchange interoperable data with other PDC systems following the ISO 8000-110 Master Data Exchange and the ISO 22745 Open Technical Dictionary standards.
Past Projects
Data Washing Machine
Sponsor: National Science Foundation
A project to fully automate data curation and data governance processes.
Next Generation, High-Performance Entity Resolution System
Sponsor: U.S. Census Bureau
The application of machine learning (ML) to improve the data matching operations of the U.S. Census Bureau.
VENSIM
Sponsor: ERIQ Research Center
Investigating the use of VENSIM to generate synthic data for our research
Matrix Comparator for Unstructured and Semi-Structured Characteristic Data
Sponsor: ERIQ Research Center
A project experimenting with new methods and techniques for comparing and linking unstructured and heterogenously structured references. Current work is focusing on the use of the Matrix comparator.
Probabilistic Matching
Sponsor: ERIQ Research Center
A project experimenting with techniques for finding the optimal parameters and logic for record linking using probabilistic matching techniques.
Comparing the Effectiveness of Deterministic Matching with Probabilistic Matching for Student Enrollment ER
Sponsor: Arkansas Department of Education (ADE)
June 2009 – March 2014
Clinical and Translation Science Initiative
Sponsor: National Institute of Health (NIH) through the University of Arkansas for Medical Sciences (UAMS)
UAMS has an undertaken an initiative to enhance the quality and integration of research data, to provide tools for collaboration, and to develop information quality training for researchers, staff and students. As part of this, UA Little Rock and UAMS have agreed three research and development efforts that support the overall initiative. The first has to do with enhancing the quality of research data warehouse. The second is providing tools for collaboration and community engagement to support translational research. Research and investigation will be done into the different collaboration and community communication tools including wikis, web logs, discussion forums, groupware, screen sharing, and content management. The third is the development and implementation of Information Quality Training Classes. The training will focus on three groups: students enrolled in Bioinformatics graduate programs, students enrolled in graduate programs related to translational research, and faculty and staff involved in translational research.
Referent Tracking in Health Care
Sponsor: University of Arkansas for Medical Sciences (UAMS)
A collaborative project intended to explore the integration of Referent Tracking (RT) software developed by faculty and staff at UAMS and the Open sYSTem for Entity Resolution (OYSTER) software developed by faculty, staff, and students at the University of Arkansas at Little Rock. Referent Tracking seeks to assign an instance unique identifier (IUI) to each individual person, process, disease, prescription, fracture, tumor, etc. Duplicate assignment of IUIs is therefore highly problematic, as is the assignment of one IUI to two different individual entities. OYSTER is software that can detect such errors in IUI assignment. Furthermore, OYSTER has a mode of operation that relies on tracking unique individual persons and addresses over time. It can thus be extended through Referent Tracking to track, and de-duplicate IUIs for diseases, prescriptions, tumors, fractures, and other types of entities.
Towards High-Quality of Identity Attributes
Sponsor: Arkansas Department of Education (ADE)
The primary goal of this research is to investigate how the quality of identity attributes will impact the quality of entity resolution of Arkansas K-12 student records. It will first study the data quality of identity attributes to identify the key quality problems, then evaluate how quality of the identity attributes relates to the quality of entity resolution (ER) results from the aspect of false negatives. It will attempt to rand identity attributes according to their impact on ER results as a means prioritize the data quality improvement efforts. The results of this study will enable ADE to communication more efficiently with schools and raise their awareness of data quality as well as to help them improve the data quality in data collection and submission processes. The results of the study will also provide guidance for refining the Cycle Validation/Data Accuracy Reports and certification process of Statewide Information Systems (SIS) so that they can be more effective in enforcing the data quality standards across all schools.
Information Quality Tools for Persistent Surveillance Data Sets.
Sponsor: US Air Force Research Laboratory, Sensor Directorate, Wright Patterson Air Force Base, Dayton, Ohio.
The Air Force desires a comprehensive vehicle to identify and address requirements for information quality tools and techniques that will support defensive and offensive operations research in the layered sensing domain. As use of remote sensors in the Air and Space domains increases, the value of the sensor datasets must be maximized and assurances established that the product outcomes meet the application requirements. As multiple sensors are combined into layered sensing systems, this increases the need to understand not only the quality and fitness for using the individual sensor data streams, but also how to assess the quality and value of the aggregate data. The scope of this task order is to develop metrics that assess the quality and effectiveness of persistent surveillance data sets. The project also explores the use of 3-D visualization in rendering layered data sets and experiments with the integration of textual information. In addition, integrating processing of data available from multiple types of sensors (such as in a Smart Environment) has been explored, and experiments have been done to support data fusion for multiple sensors.
Proof-of-Concept for an Open-System Entity Resolution Engine to Support Longitudinal Studies in Education
Sponsor: Arkansas Department of Education
The TRUSTed Project is a research project to investigate the design for an open-system, entity resolution engine for the Arkansas Department of Education that will support longitudinal (multi-year) studies of student performance in Arkansas schools. A prototype of the design will be implemented using open system tools and standards.
Delta Center for Identity Solutions
A collaboration with Arkansas State University Center for the Study of Automatic Identification
Sponsor: Arkansas Science and Technology Authority
Research related to both identity management and identity information management. Identity management refers to the accurate, real-time, secure and tamper-proof identification and authorization of people for physical access to facilities, logical access to computer networks and for performance of various transactions. Identity information management entails measuring and improving information quality, detecting fraudulent behavior, and understanding data privacy and protection issues.
A Semiotic Approach to Layout Inference and Data Transformation
Sponsor: Acxiom Laboratory for Applied Research
An investigation into the application of the principles of semiotics (intention, syntax, and semantics) to the problem of locating and identifying information in datasets of unknown layout through syntactic and semantic profiling. It also envisions the reuse of profiling information to the problem of developing a declarative language to describe and implement data transformations.
Methods and Techniques for Entity Identification in Open Source Documents with Partially Redacted Attributes
Sponsor: National Science Foundation
An investigation into the degree to which partial identity information (“identity fragments”) can be resolved within a large reference set of known identities based on a variety of collateral evidence including quality of match, age consistency, and family member co-location.
