Machine learning methods have blossomed over the recent decade as a means to analyze patterns in data using a training set that is understood and expanding the knowledge from this to a broader context. A key concept is that there is a method applied that automatically adjusts the choice of model based on its performance in meeting the goals of the model as there is exposure of the model to additional data. Underlying this is the agreement on appropriate performance metrics and that the process of adjustment of a model, sometimes through choice of parameters, sometimes through choice of the underlying model, proceeds through an algorithm that is automatic. The entire process is inherently related to the issue of model evaluation, encoded through the performance metrics. There are hosts of mathematical concepts underlying machine learning algorithm development and assessment of performance, including virtually every area of classic applied mathematics (e.g. calculus, optimization, linear algebra, probability, numerical analysis).
The objective of this one-credit-hour seminar is to provide an overview of the variety of applications of machine learning to ecology, broadly construed. The focus is on the underlying mathematical ideas, not on the coding or detailed implementation of the algorithm. We will start with an overview of key ideas (e.g differences between supervised, unsupervised, reinforcement and deep learning), discuss some of the main problems for which machine learning methods have been applied (e.g. regression, prediction, dimensionality reduction, regularization, probability distribution estimation, clustering, classification), and throughout will use examples of applications in ecology (ecological forecasting, neural net methods to estimate parameters in process models, image analysis for species classification, prediction of invasive species outbreaks).
Participants are assumed to have some of the underlying undergraduate-level background in mathematics, and will be expected to choose a particular application of machine learning in an area of interest to their research, become knowledgeable about associated articles or books inform the instructor regularly about what they are reading, and be prepared as the semester progresses to comment in class about what they have learned about their chosen topic. The instructor will provide an extensive list of papers and other references, provide a conceptual overview of each of the approaches with a bit of mathematical detail, and guide discussions in collaboration with course participants.
This course is offered online synchronously and the course meeting time is 1:10-2:00PM.
We will use Zoom for class meetings and will share documents using the Basecamp site for the seminar. In addition to attending class, registered participants are expected to share their understanding of the topic they have chosen. At the end of the semester, each participant is expected to produce a short report on some application of machine learning to a problem of interest to them. This could include use of one of the many available tools (in Matlab, R, Python, TensorFlow, etc.) available to apply machine learning, or it could be a discussion of an application and the mathematics.
Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. 2020. Mathematics for Machine Learning. Cambridge University Press. (pdf of the book is available through this site)
Wale Akinfaderin. 2017. The Mathematics of Machine Learning (blog link)
Coursera course on Mathematics for Machine Learning Specialization (link)
edX course on Essential Mathematics for Machine Learning in Python (link)
MIT OpenCourseWare course on Mathematics of Machine Learning (link)
Vincent Chen. Learning Math for Machine Learning (link)
Sharoon Saxena. 2019. Mathematics behind Machine Learning – The Core Concepts you Need to Know (link)
Jon Lefcheck. 2015. A practical guide to machine learning in ecology/ (blog link)
Benjamin D. Dalziel, Juan M. Morales, and John M. Fryxell. 2008. Fitting Probability Distributions to Animal Movement Trajectories: Using Artificial Neural Networks to Link Distance, Resources, and Memory. American Naturalist 172(2):248-258 (link)
M. Joseph Hughes, S. Douglas Kaylor and Daniel J. Hayes. 2017. Patch-Based Forest Change Detection from Landsat Time Series. Forests, 8(5), 166. (link)
Michael C. Dietze, Andrew Fox, Lindsay M. Beck-Johnson, Julio L. Betancourt, Mevin B. Hooten, Catherine S. Jarnevich, Timothy H. Keitt, Melissa A. Kenney, Christine M. Laney, Laurel G. Larsen, Henry W. Loescher, Claire K. Lunch, Bryan C. Pijanowski, James T. Randerson, Emily K. Read, Andrew T. Tredennick, Rodrigo Vargas, Kathleen C. Weathers, and Ethan P. White. 2018. Iterative near-term ecological forecasting: Needs, opportunities, and challenges. PNAS 115(7):1424-1432 (pdf)
Slides from presentations:
Slides of basic definitions in machine learning from Presentation on August 27, 2020 (.pptx file)
Slides of main problems in machine learning and short annotations of some articles from Presentation on Sept. 3, 2020 (.pptx file)
Slides of introduction to the classification problem from Presentation on Sept. 17, 2020 (.pptx file)
Slides of overview of classification including example of statistical decision applied to Iris data from Presentation on Sept. 24, 2020 (.pptx file) Matlab .m file IrisBayesRisk.m for making calculations and graphs on Bayes risk described in the example in classification overview slides using a uniform distribution of sepal length (Matlab .m file) Matlab .m file IrisBayesRiskTriangle.m for making calculations and graphs on Bayes risk described in the example in classification overview slides using a triangular distribution of sepal length (Matlab .m file) Data file RawIrisDatamatrix.txt that contains Iris Data from Fisher 1936 paper (.txt file of the matrix of data)
Slides of classification approaches as general statistical decision problem, including example of triangle distribution application to Iris data from Presentation on Oct. 1, 2020 (.pptx file) Matlab .m file IrisBayesRiskUniformErrorRate.m to calculate the empirical Bayes risk error described in the example in classification overview slides using a uniform distribution of sepal length (Matlab .m file) Matlab .m file IrisBayesRiskTriangleErrorRate.m to calculate the empirical Bayes risk error described in the example in classification overview slides using a triangular distribution of sepal length (Matlab .m file)
Slides of machine learning approaches to classification, including example of discriminant analysis to Iris data from Oct 8 presentation (.pptx file) Matlab .m file DiscriminantAnalysisIris.m to calculate the linear and quadratic discriminants for the Iris Sepal Length and Sepal Width data (Matlab .m file)
Slides of Dimension Reduction methods including PCA from Oct 15 presentation (.pptx file)
Slides of Multidimensional Scaling Example and Cluster Analysis Overview from Oct 22 presentation (.pptx file)
Slides of Artificial Neural Nets Overview from Oct 29 presentation (.pptx file)
Slides of Introduction to Ecological Forecasting from Nov. 4 presentation (.pptx file)
Slides of Continuing Discussion on Ecological Forecasting from Nov. 11 presentation (.pptx file)
Return to L. Gross Home Page at NIMBioS