Data mining techniques from statistics, machine learning, and
visualization as applied to scientific knowledge discovery. Students will be
given a set of case studies and projects to test their understanding
of this field and to provide a foundation for future applications
in their careers.
This course provides a broad overview of the data mining component
of the knowledge discovery process, as applied to scientific research.
Scientific databases are growing at near-exponential rates.
As the amount of data has grown, so has the difficulty in analyzing
these large databases. Data mining is the search for hidden,
meaningful patterns in such databases. Identifying these patterns
and rules can provide significant competitive advantage to scientific
research projects and in other career settings. Data mining is
motivated and analyzed as the killer app for large
scientific databases. Data mining techniques, algorithms, and applications
are covered, as well as the key concepts of machine learning,
data types, data preparation, previewing, noise handling,
feature selection, normalization, data transformation,
similarity measures, and distance metrics. Algorithms and techniques
will be analyzed specifically in terms of their application to
solving particular problems. Several scientific case studies
will be presented from the science research literature.
The techniques that are presented will be drawn from well known
statistical, machine learning, visualization, and database algorithms,
including clustering, decision trees, regression, Bayes theorem,
nearest neighbor, neural networks, and genetic algorithms. Topics will
include informatics, semantic knowledge mining, and the integration
of data mining with large (and often distributed) scientific databases.
30% = Homework and Lab Exercises
10% = Class Participation
20% = Midterm Exam
40% = Final Exam
to develop an understanding of data mining and its scientific applications;
to become familiar with a variety of data mining concepts,
techniques, and algorithms;
to become capable in applying these techniques and algorithms
to solve scientific problems; and
to provide a foundation and develop the skills for future
data-intensive applications in the student's career.