Resumo (PT):
Abstract (EN):
In previous work we have proposed a statistical model to
capture the user behaviour when browsing the web. The
user navigation information, obtained from web logs, is modelled as a hypertext probabilistic grammar (HPG) which
is within the class of regular probabilistic grammars. The
set of highest probability strings generated by the grammar corresponds to the user preferred navigation trails. We
have previously conducted experiments with a Breadth-First
Search algorithm (BFS) to perform the exhaustive computation of all the strings with probability above a specified
cut-point, which we call the rules. Although the algorithm's
running time varies linearly with the number of grammar
states, it has the drawbacks of returning a large number of
rules when the cut-point is small and a small set of very
short rules when the cut-point is high.
In this work, we present a new heuristic that implements an
iterative deepening search wherein the set of rules is incrementally augmented by first exploring trails with high probability. A stopping parameter is provided which measures
the distance between the current rule-set and its corresponding maximal set obtained by the BFS algorithm. When the
stopping parameter takes the value zero the heuristic corresponds to the BFS algorithm and as the parameter takes
values closer to one the number of rules obtained decreases
accordingly.
Experiments were conducted with both real and synthetic
data and the results show that for a given cut-point the number of rules induced increases smoothly with the decrease of
the stopping criterion. Therefore, by setting the value of
the stopping criterion the analyst can determine the number and quality of rules to be induced; the quality of a rule
is measured by both its length and probability.
Language:
English
Type (Professor's evaluation):
Scientific
No. of pages:
11