15th Meeting of Researchers in Biometrics / Statistics
June 23-26, 2008, Cavtat / Dubrovnik, Croatia
Co-organizers
HBMD Croatian Biometric Society
SRCE - University Computing Centre
BIOSTAT Chair
Marija Pecina
BIOSTAT Topics
Problems in bio-information analysis
- Data Visualization
- Space Reduction Methods
- Prediction and Classification
- Time and Space Modeling
- Computationally Intensive Methods
- Other
13th School of Biometrics
School of Biometrics Invited Lecturer:
prof. Lynne Billard, University of Georgia, USA
Title: Symbolic Data Analysis
Abstract:
Symbolic data appear in numerous settings, in all avenues of the sciences and social sciences, from medical, industry and government experiments, and data collection pursuits. Some data are inherently symbolic. Some, perhaps most, arise as the result of the massive datasets that emerge from contemporary computer capacity. Such datasets have to be aggregated in some meaningful way (with the actual aggregation being instructed by the scienti_c questions of interest). Our aim is to provide an introduction to symbolic data and how such data can be analysed. Classical data on p random variables are represented by a single point in p-dimensional space Rp. In contrast, symbolic data with measurements on p random variables are p-dimensional hypercubes (or hyperrectangles) in Rp, or a cartesian product of p distributions, broadly defined. There are many possible formats for symbolic data.
Basic descriptions of symbolic data and how they contrast with classical data are covered first. For example, it may not be possible to give the exact cost of an apple (or shirt, or product, or ...), or the exact pulse rate measurement, but only its value in the range [66, 74], (say). We note also that an interval value of [66, 74] differs from that of [68, 72] even though these two intervals both have the same midpoint value of 70. A classical analysis using the same midpoint (70) would lose the fact that these are two differently valued realizations with different internal variations.
Methodologies for obtaining basic descriptive statistics for random variable whose values are symbolic valued, viz., a histogram and its empirical probability distribution relative, along with the empirical mean, variance, and covariance, will be presented. This will be followed by methodologies that deal with regression (including regression methodologies for handling taxonomy tree structures and hierarchy tree structures, if time permits), clustering, and principal components, respectively.
These methods are extensions of well-known classical theory applied or extended to symbolic data. Our approach assumes knowledge of the classical results, with the focus on the adaptation to the symbolic data setting. Therefore, only minimal classical theory is provided.
How to Apply for the BIOSTAT 2008
The ITI Conference participants who wish to join the Meeting or School should register using the ITI Registration Form. Please indicate on the Form your intention to participate in the School of Biometrics, in order to receive the School handouts at the Conference. For the ITI participants, there is no additional fee for the School. Submitted papers or poster abstracts (http://iti.srce.hr/html/submit.html) should reach ITI Conference secretariat within ITI deadlines and should follow ITI Instructions to authors (http://iti.srce.hr/html/itinstr.html). Accepted papers will be presented within the ITI topic: Data Mining, Statistics and Biometrics. The detailed program will be announced later, on the ITI web site at http://iti.srce.hr/.