Data Mining, Statistics and Biometrics

doi: 10.2498/iti.2013.0554

Statistical Linkage across High Dimensional Observational Domains

Leonard B. Hearne, Derek Kelly, Avimanyou Vatsa, Wade Mayham, Toni Kazic

Abstract

Many experimental sciences collect different kinds of high-dimensional data on the same experimental units. When comparing relationships among homogeneous regions in one high dimensional domain with regions in another high dimensional domain, the number of possible comparisons may be extremely large and their set complexity unknown.
We outline procedures for identifying possible relationships among regions in two different high-dimensional domains. If the data are dense enough, then statistical measures of association can be estimated. These procedures can identify and measure the probability of inter-domain associations of mixed complexity.

Keywords

High Dimensional Data, Dimension Reduction, Parallel Coordinate Graph, DNA sequence analysis, CART, MARS, Geometric Density Estimator, Complex Phenotypes, Maize

Full text is available at IEEE Xplore digital library.


doi: 10.2498/iti.2013.0511

Load Profile Analyses Using R Language

Ilir Keka, Mentor Hamiti

Abstract

With global energy market liberalization, the demand for electricity is essential information for electrical companies. The need for electricity is expressed with the demand for electric power in function of time. The purpose of this paper is to find a mathematical model for the relation between electric Power and Time, using linear regression through the programming language R. In order to find this model we used electric Power data from the Load Profile of an electrical substation (110/35/10 kV). The aggregations of data are done through R for acquiring and analyzing daily, weekly and monthly loads as well as calculation of statistical parameters.

Keywords

Load Profile, R, Data Mining, Regression, Electric Power, Time

Full text is available at IEEE Xplore digital library.


doi: 10.2498/iti.2013.0505

Continuous User Verification Based on Behavioral Biometrics Using Mouse Dynamics

Mirko Stanić

Abstract

This paper provides overview of the current methods of identifying users based on their interactions with a computer keyboard, mouse or a touchscreen and argues that in their current state of development none of them are capable of establishing the users identity within the time it takes for a user to input a password.
The paper proposes the application of behavioral biometrics as a supplement to regular password based user authentication as a safeguard against unauthorized users gaining access to a computer that is already running an authenticated session e.g. unattended computers in offices.

Keywords

behavioral biometrics, mouse dynamics, verification

Full text is available at IEEE Xplore digital library.


doi: 10.2498/iti.2013.0576

Source Code Similarity Detection by Using Data Mining Methods

Emil Stankov, Mile Jovanov, Ana Madevska Bogdanova

Abstract

Programming courses at university and high school level, and competitions in informatics (programming), often require fast assessment of received solutions of the programming tasks. This problem is usually solved by use of automated systems that check the produced output for some test cases for every solution.
In our paper we present a novel approach of representation of the programming codes as vectors, and use of these vectors in data mining analysis that could produce better assessment of the solutions. We present the results of cluster analysis that go up to 88% of correctly clustered items on average.

Keywords

programming code, evaluation of source code, code similarity, clustering analysis

Full text is available at IEEE Xplore digital library.


doi: 10.2498/iti.2013.0577

Stock Market Analysis - Strongest Performing Stocks Influence on an Evolutionary Market

Monica Tirea, Viorel Negru

Abstract

On the stock market, it is said that the strongest stock gives the direction of those stocks that are in the same domain (IT, services, oil, and others). Determining the strongest stock plays an important part in finding a better moment to enter on a position. This paper presents a Multi-Agent architecture that combines Technical Analysis, Neural Networks and Statistical Methods in order to find the strongest stock and to make a better forecast on the market future trend and trigger an entry/exit signal based on the market basket classification. A prototype was developed and applied on the Bucharest Stock Exchange Market (BSE).

Keywords

Stock Trend Prediction, Stock price, Trading Strategies, Technical Analysis, Z-score, Strongest Stock, Entry/Exit Points

Full text is available at IEEE Xplore digital library.


doi: 10.2498/iti.2013.0512

Statistical Variability vs. Probabilistic Uncertainty

Kalman Žiha

Abstract

The concepts of variability and uncertainty came from experience and coexist with different connotations. First, the article reviews the statistical methods for variability assessments of probability distributions. Next, it sums up the entropy concept of uncertainty of systems of events in probability theory. The two concepts are brought closer together on the basis of common experience of predictability. The article also considers the concept of average number of equally probable events based on entropy. Then, it introduces the concept of equivalent number of outcomes based on variability of probability distributions. Finally, the link between variability and uncertainty is illustrated with examples.

Keywords

variability, uncertainty, entropy, predictability, equivalent numbers of outcomes

Full text is available at IEEE Xplore digital library.