ECML/PKDD 2004, Pisa, Italy, September 20-24, 2004
Workshop on "Statistical Approaches to Web Mining" SAWM'04
20 September 2004, Pisa, Italy

in conjunction with ECML/PKDD 2004: The 15th European Conference on Machine Learning (ECML) and The 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 20-24 September, 2004, Pisa, Italy.


  • Marco Gori Dipartimento di Ingegneria dell'Informazione, Siena, Italy
  • Michelangelo Ceci (co-chair) Department of Informatics, University of Bari, Bari, Italy
  • Mirco Nanni (co-chair) KDDLab, ISTI-CNR, Pisa, Italy


The explosive growth and popularity of the World-Wide Web has resulted in a huge number of information sources on the Internet and the promise of unprecedented information-gathering capabilities, which can be focused to:

  • Extraction of knowledge from the Web: the Web is a huge collection of documents and sophisticated knowledge extraction methods are required to effectively access the information they contain. Such methods include machine learning and data mining techniques for information categorization, extraction, and search, as well as for adapting to the interests of the users.
  • Extraction of knowledge from the user's behaviour: the Web is a venue for doing business electronically, as well as for the interaction, information acquisition and service exploitation used by public authorities, non-governmental organizations, communities of interest and private Persons. When observed as a venue for the achievement of business goals, the Web presence should be aligned with the objectives of its owner and the requirements of its users. This raises the demand for understanding Web usage, combining it with other sources of knowledge inside an organization, and deriving lines of action.

Unfortunately, the morass of sources presents a formidable hurdle to effectively extract information from them. In recent years a growing number of machine learning and data mining methods have been applied to this problem. In many cases, the theoretical glue binding them together is manifestly that of statistics. Some examples of well-known statistical learning methods applied to web mining problems are Support Vector Machines (SVM), Bayesian classifiers, neural networks, as well as unsupervised learning methods (clustering, principal component analysis, and so on). Statistical data mining methods are also used for data pre-processing, transformation and result visualization.

The purpose of this workshop is to bring together researchers with background in machine learning, data mining, statistics and pattern recognition who are interested in facing different problem of information-gathering on the Web. The workshop is the third follow up event within the Web Mining Forum supported by the Network of Excellence on Knowledge Discovery (KDNet, IST project No. 2001-33086).

General information

The workshop will maintain a balance between theoretical issues and descriptions of case studies to promote synergy between theory and practice. It aims to be a highly communicative meeting place for researchers working on similar topics, but coming from different communities. In order to achieve these goals, the workshop will consist of one or two invited talks, followed by short presentations and longer discussions.

Each author will be encouraged to read another accepted paper and to comment on it after the original talk has been given. Authors should make certain that the techniques they describe deal with the issues that are associated with the workshop.

All ECML/PKDD'04 SAWM workshop participants must also register for the main ECML/PKDD conference. Workshop attendance will be limited to registered participants.


The SAWM'04 workshop will be held in conjunction with the following tutorial:

Invited Talk

The following invited talk will be presented within the workshop:
  • "Computational and Statistical methods for web usage mining: a critical comparison", by Paolo Giudici.

Important Dates

    Submission deadline: June 21, 2004
    Notification of acceptance: July 12, 2004
    Workshop paper camera-ready deadline: July 19, 2004
    Workshop proceedings (camera- and web-ready): July 26, 2004
    Workshop: September 20, 2004