Animal
Spatial
Ecology
Members
Mathieu Basille
Clément Calenge
Jodie Martin
Bram van Moorter
Vincent Tolon
Resources
EcoTeX
Training courses
Functions

Clément Calenge

E-mail:


index


Research profile

My position of biometrician is at the intersection of three scientific domains: the biology, the statistics and computer science. The biology provides data structures and aims. My role is to search the statistics to find adequate mathematical models, which can be used to develop statistical methods to analyse these data structures, to fulfill the biological aims. Finally, I program the developed methods on computer, so that they can be used to achieve the stated objectives. During my Ph.D. I mainly studied the ways to identify the difference of structures of two clouds of points in a multidimensional space, in the framework of Ecological niche analysis. I pursued some of these research during my Post-doc. Presently, I am also interested in the statistical methods that can be used to analyse GPS data.

(detailed curriculum-vitæ)


Publication list

Journals referenced in the current contents (Impact Factor 2006: IF)

Other publications

Papers submitted or in revision


Detailed Description of my present researches

I describe my researches on my two main subjects of interest below.

Niche analysis

One of my main research interest is the development of factorial methods to analyse one or several niches in the ecological space. The basic data structure is best illustrated by an example: a study area is defined, and several raster maps of environmental variables (elevation, food ressource, etc.) are available for this study area (same size, same resolution). Each environmental variable define a dimension in a multidimensional space, the ecological space, and each pixel define a point in this space. The whole set of pixels of the map defines a cloud of "available points" in the ecological space. Then, the occurrences of a given species are sampled on this area. The pixels containing at least one occurrence define a subset of the available points, viz. the "used points". Roughly, one can say that these used points define a kind of "niche" for the species. The aim of the analysis is to determine the structures of the niches, i.e. to identify the directions in the ecological space over which the distribution of used points is as different as possible of the distribution of available points. Although many methods exist to model the niche (GAM, GLM, etc.), most of them suppose that the modeller knows the variables that affect the probability of selection. Few are actually existing to explore the niche in the ecological space. Points may also correspond to several categories (e.g. relocations of animals monitored using radio-tracking, etc.). In this case, several "niches" can be defined (a design called design II) and the aims are various: the aim can be to discriminate the niches (to determine the directions over which the niches are the most separated) or to buid a typology of the niches (to identify the directions over which the niches are, in average, as different as possible from the cloud of available points), etc.

I chose to focus on factorial analyses (PCA-like analyses), as they have numerous qualities for data exploration without any heavy hypothesis on the data. I recently developed a framework for this analysis, the General Niche-Environment System Factor Analysis (GNESFA). This algorithm generalises several existing factorial methods for this kind of analysis. Depending on the centering chosen for the table giving the values of the environmental variables for each available point, the GNESFA returns different, but complementary results. Thus, I demonstrated that the widely used Ecological Niche Factor Analysis (Hirzel et al. 2002) is a special case of the GNESFA (Calenge & Basille, in prep.), and that this method returns the best possible undistorted image of the niche in the ecological space (Basille et al. submitted). Another special case of this algorithm is the Mahalanobis Distances Factorial Analysis (MADIFA, Calenge et al. in press). The Mahalanobis distances between the available points and the centroid of the niche in the ecological space is a common way to build potential habitat map (Clark et al. 1993), which is an important tool in wildlife management. Although these distances give indices of the "typicality" of given combinations of environmental variables, with regard to existing data, on the other hand they do not indicate which environmental variables contribute the most to this "atypicality" of the environment. The MADIFA partitions the Mahalanobis distance into several additive components, which correspond to directions of the ecological space, so that the average Mahalanobis distance is maximised on the first axes. The MADIFA is a way to find the contribution of the environmental variables to the atypicality of the environment.

The GNESFA has proved to generalize some existing methods for the analysis of design II data. Thus, If each niche is replaced by its centroid in the ecological space, the GNESFA of the centroids - still regarding the existing cloud of available points - generalizes the Outlying Mean Index analysis (OMI analysis, Doledec et al. 2000). I proved in my Ph.D. thesis that when the environment consists of several categories (e.g. vegetation types), i.e. when the environmental variables are the dummy variables coding for a qualitative variable, the OMI analysis is just the Eigenanalysis of Selection Ratios (ESR, Calenge & Dufour, 2006), a method which generalizes within the framework of factorial analysis the selection ratios (Manly et al. 1972) and the test of Neu et al. (1974), two widely used statistics in habitat selection studies.

Note that I also developed a method, which is related to the OMI analysis, but which is not a special case of the GNESFA, named K-select analysis (Calenge et al. 2005). This method allows to explore the structures of K niches in K different clouds of available points (e.g. to study the habitat selection by animals inside their home range, using radio-tracking data). This analysis has, for example, been used to highlight a functionnal response in the habitat selection by the roe deer (Pellerin et al. in prep.).

I programmed all these methods in a package for the R software (R Development Core Team, 2006) named adehabitat (Calenge 2006). This package also contains other functions allowing the analysis of habitat selection (ressource selection functions, etc.), functions allowing the interaction with geographic information systems, functions for home-range estimation, etc. This package has a growing number of users, and another package relying on adehabitat have even been developed in landscape ecology to study the connectivity of landscape, the package "lsmetrics" (Ribeiro et al. submitted).

Present leads for future research:

I deeply studied the properties of the GNESFA in the case where the environment is described by continuous variables. But its properties in the case of qualitative variables are still unknown, and need further studies. Only in the special case of the OMI analysis, I showed that this analysis was equivalent with the ESR. As noted above, the ESR is closely connected with the concept of selection ratios... which are themselves closely related to the concept of Resource Selection Functions (RSF), a widely used methodology relying on the generalized linear model (Manly et al. 2002, see below)! I wonder whether the GNESFA is connected to other existing methods used to study habitat selection in other special cases.

More generally, my main interest presently is to draw a map of existing methods to explore, analyse and model the niche, in order to identify the cases where different methods are mathematically similar, or to identify the methods which are connected to the same mathematical model and can be articulated together into the same approach. For example, given the sample of available points and the sample of used points, it is possible to model the RSF, which gives the probability that an available point is used. What parametric models have to be assumed for the RSF to result into a niche generated by a multivariate normal probability density function, given that the cloud of available points is itself drawn from a multivariate normal probability density function? Conversely, given that the RSF is symetric, and that the cloud of available points is drawn from a multivariate normal distribution, what shape is expected for the niche (an important ecological question, as the shape of the niche is the subject of a large debate in ecology, see Austin 1999)? How to achieve the consistency between these different concepts (available distribution, use distribution, niche)?

Another important question is raised by the relationship between the GNESFA and the OMI analysis. The OMI analysis allows to analyse simultaneously several niches. The OMI analysis focuses only on the marginalities of the different niches, i.e. on the vectors relating the centroid of the cloud of available points to the centroid of the niches. However, summarizing K niches only with the help of their centroid is a bit simplistic. This raises the question of the exploration of the similarity of structures of different niches, based not only on their marginality, but also on their "tolerance" (inertia of the niche, i.e. sum of the variance on directions). The question of the analysis of several niches in the ecological space implies several more complex questions, including: what are the directions in the ecological space over which the different niches are as similar as possible? what are the directions in the ecological space over which the different niches are as different as possible? how to take into account the interactions between different species in RSF models, and the corresponding relationships with the distributions of available and used points for the different species? Such questions are essential to allow prediction, for example predictions of the spatial distribution of a species under a climate change.

GPS data analysis

Since January 1st, 2006, I work as a Post-Doc in the team of Population Ecology of the laboratory of Biometry in Lyon. The release, in september 2004, of adehabitat, which contains a lot of functions to analyse radio-tracking data, has encouraged several biologists of several countries to form an international group, to allow the development of methods to analyse animal movements. Paolo Cavallini (Faunalia, Italy) has thus founded the international group Animove (http://www.faunalia.it/animov/index.php), to which I actively participate. In France, many wildlife management organizations (Office national de la chasse et de la faune sauvage [ONCFS], Institut national de la recherche agronomique [INRA]) were increasingly concerned by the analysis of GPS data. The GPS is a new technology allowing the automatic collection of relocations of animals. Because this collection is automatic, the time lag between two relocations is often very short, and results into a precise sample of the movements of the animal. For the first time in Ecology, the movements of some species (e.g. roe deer, wild boar, etc.) can be precisely sampled, while it was not possible before. This raises several questions concerning the analysis of such trajects. How to identify the structures in the traject? what are the components of the environment which explain these structures? etc.

As a result, I joined a research team composed of several biologists belonging to several organization (CNRS, ONCFS, INRA, etc.) and working mainly on terrestrial ungulates (Red deer, mouflon, chamois, roe deer, but also brown bear). The group also includes two biometricians (S. Dray and me) and one mathematician (M. Royer), and we develop together an analysis approach for this kind of data. This group was formed at the beginning of my Post-doc and we already developed a framework for this analysis.

We first asked "what is a trajectory ?". The first definition we used for this class was "a set of successive relocations for which the time of collection is known", which encompass several data types, including Argos data, classical radio-tracking, GPS, etc. However, as our aim was to develop a new approach for analysis of GPS data, we then wondered "how to describe a traject?". A traject is a set of successive "steps", and a description of a traject implies a description of these steps and of their combination within the traject. After a review of the literature, we chose a certain number of parameters which, in our opinion, provide together a complete description of the steps: time lag between two successive relocations, distance between them on the plane X-Y, distance on the X direction, distance on the Y direction, absolute angles measured from the east direction, relative angles measured from the direction of the previous step, and mean squared displacement since the first relocation of the traject (Calenge et al. in prep.).

We then worked at developing a new methodology to analyse this kind of data. M. Royer pointed out the interest of several mathematical models for the traject analysis (biased and arithmetic Brownian motion, Ornstein-Uhlenbeck process, and correlated random walk), which are far too simplistic to take into account the complex structures of the traject, but which can be useful as null models (Royer et al. in prep.). These models are well-known by mathematicians, and the properties of the parameters described above are well known, so that tests can be developed to compare the data to these models.

We thus developed several tests based on these models. The first tests we developed were tests of independence between the parameters characterizing the successive steps (Dray et al. in prep). The rationale behind these tests is the following: the above described models generates trajects with some properties concerning the descriptive parameters (e.g. the correlated random walk returns trajects with independent distance between successive relocations, independent relative angles, but autocorrelated absolute angles, etc.). A profile can be defined for each model, so that a battery of tests of independence can be applied to a traject, and the resulting profile can be compared to the "model profiles", and biological information can be drawn from these comparisons. For example, the traject may differ from a brownian motion because the absolute angles are autocorrelated, which may indicate that the animal tends to keep the same direction from one step to the next. The same traject may differ from the correlated random walk because the absolute angles are also positively autocorrelated, which may indicate that some parts of the traject corresponds to one type of behaviour (e.g. feeding) and other parts may correspond to another type (e.g. resting). We are currently developping other tests (e.g. based on the distribution of distances).

Leads for future research:

Our research raises several questions:

How to "split" a heterogeneous traject (from the point of view of the above-described properties) into several homogeneous sub-trajects? Several methods, such as clustering methods, or algorithms developed in molecular biology provide several possibilities.

In many occasions, the GPS fails to locate the animal, which results into a missing value in the traject. The trajects may sometimes contain a large proportion of missing relocations (30% of missing values have been observed in some trajects) We took into account the presence of missing values in the tests that we developed, but this is often a difficult problem. This raises a question: is it possible to estimate the likely position of the animal in the case of missing values, using information from the traject and eventually from the environment?

Until now, we have developed tests to analyse only one traject. Many of the data provided by the biologists are related to several animals. How to generalise the tests to several animals? And above all, how to identify the interactions between them? Indeed, this question is often of the utmost interest for the ecologists, and is often at the origin of the study. The difficulty is that a given structure of the traject may be caused by the environment, these interactions with other animals, or by the animal itself (its internal rythms: feeding/resting). This may imply a step of mathematical modelling.

Literature Cited