Tag Archives: GBIF

Creating figures like the paper ‘Completeness of Digital Accessible Knowledge of Plants of Ghana’ Part 4

13 Dec comp6

This is the fourth part of the of the post where are going to create figure 4 Plot of Inventory Completeness against sample size for grid cells. Part 3 of this series we created chronohorogram for understanding seasonality by year of the data records.

If you have not already done so, please follow steps in Part 1 of the post to set up the data. Since this functionality was recently added to package bdvis, make sure you have v 0.2.9 or higher installed on your system.

The first step to generate this plot will be to compute completeness of the data. Package bdvis provides us with a handy function for that, as long as we want to compute the completeness for a degree grid. This was partially covered in an earlier blog post.

comp = bdcomplete(occ)

This command would return a completeness data matrix called comp and generate a plot of inventory completeness values (c) versus number of spices observed (sobs) in the data set as follows.

comp1

head(comp)
   Cell_id nrec   Sobs  Sest      c
 1 35536   3436   243   276.3514  0.8793151
 2 35537   4315   299   318.7432  0.9380592
 3 35538   518    152   187.4118  0.8110483
 4 35896   17148  320   343.9483  0.9303724
 5 35897   7684   300   338.8402  0.8853732
 6 35898   865    169   216.7325  0.7797632

 

The data returned has cell identification numbers, number of records per cell, number of observed and estimated species and the completeness coefficient (c).

The default cut off number of records per grid cell is 50, but let us set that to 100 so we can filter out some grid cells which are data deficient.

comp = bdcomplete(occ, recs=100)

The graph we want to plot is Inventory Completeness (c) against sample size for grid cells (nrec) and not the one provided by default.

plot(comp$nrec, comp$c, main="Completeness vs number of species",
     xlab="Number of species", ylab="Completeness")

Will produce a graph like this:

comp2

 

The problem with this graph is since there is very high variation in number of records per grid cell, majority of points having less than 5000 records are getting mixed up. So let us use log scale for number of records.

plot(log10(comp$nrec), comp$c, main="Completeness vs number of species",
     xlab="Number of species", ylab="Completeness")

comp3

Now this looks better. Let us change the x axis labels to some sensible values, to make this graph easy to understand. For that we will remove the current x axis labels by using xaxt parameter and then construct and add the tick marks and values associated.

plot(log10(comp$nrec),comp$c,main="Completeness vs number of species",
     xlab="Number of records",ylab="Completeness",xaxt="n")
atx <- axTicks(1)
labels <- sapply(atx,function(i) as.expression(bquote(10^ .(i))))
axis(1,at=atx,labels=labels)

comp4

Not let us add the lines to denote the cut off values of completeness we want to consider i.e. higher than 0.5 as inventory completeness values for cells having number of records greater than 1000.

abline(h = 0.5, v = 3, col = "red", lwd = 2)

comp5

Now we may set the point size and shape to match the figure in paper by using pch and cex parameters. The final plot code will be as follows:

plot(log10(comp$nrec),comp$c,main="Completeness vs number of species",
     xlab="Number of records",ylab="Completeness",xaxt="n",
     pch=22, bg="grey", cex=1.5)
atx <- axTicks(1)
labels <- sapply(atx,function(i) as.expression(bquote(10^ .(i))))
axis(1,at=atx,labels=labels)
abline(h = 0.5, v = 3, col = "red", lwd = 3)

comp6

If you have suggestions on improving the features of package bdvis please post them in issues in Github repository and any questions or comments about this post, please poth them here.

References

Creating figures like the paper ‘Completeness of Digital Accessible Knowledge of Plants of Ghana’ Part 3

22 Nov chrono4

This is the third part of the of the post where we are replicating the figures from a paper and in this part we are going to create figure 2 the Chronohorogram. Part 2 of this series we created temporal plot for understanding seasonality of the data records (Figure 1b).

If you have not already done so, please follow steps in Part 1 of the post to download and set up the data. Make sure you have v 0.2.9 or higher installed on your system.

To create a chronohorogram, is really very simple using our package bdvis.

chronohorogram(occ)

chrono1

Though the command has created the diagram, it does not look right. The diagram does not cover the range of all years, represented in the data. Since we have used command without many paramaters, it has used default year values for start and end. Let us check what is the range of years we have in the data. For that we can simply use command bdsummary.

bdsummary(occ)
Total no of records = 1071315

Temporal coverage...
 Date range of the records from 1700-01-01 to 2015-06-07

Taxonomic coverage...
 No of Families : 0
 No of Genus : 0
 No of Species : 1565

Spatial coverage ...
 Bounding box of records 6.94423 , -83.65 - 89 , 99.2
 Degree celles covered : 352
 % degree cells covered : 2.34572837531654

 

This tells us that we have data available form 1700 till 2015 in this data set. Let us try by specifying starting year and let package decide the end year.

chronohorogram(occ, startyear = 1700)

chrono2

Looking at the diagram it is clear that we hardly have any data for first 150 years, i.e. before 1850, so let us generate the diagram with starting year as 1850.

chronohorogram(occ, startyear = 1850)

chrono3

The diagram looks good except the points look smudged into each other, so let us reduce the point size to get the final figure.

chronohorogram(occ, startyear = 1850, ptsize = .1)

chrono4

If you have suggestions on improving the features of package bdvis please post them in issues in Github repository and any questions or comments about this post, please poth them here.

References

Creating figures like the paper ‘Completeness of Digital Accessible Knowledge of Plants of Ghana’ Part 2

8 Nov final tempolar

Continuing from Part 1, in case you have not done so, please set up the data as described before we try to make this temporal polar plot.

To create Figure 1b. Graph showing accumulation of records through time (during the year) we need  use function tempolar. This name ‘tempolar’ is simply a short of ‘temporal polar’. For this plot, we just count records for each Julian day, without considering the year. This tells us about seasonality of the data records.

Let us continue from the the previous part with code too, if if you do not have the data set up, please visit Part 1 and run the code.

First create just a very basic tempolar plot.

tempolar(occ)

Now this created the following graph:

Basic tempolar plot

This graph looks very different than what we want to create. This is plotting the data for each day, but the plot we want is for monthly data. Let us sue timescale = “m” to specify monthly data aggregation.

tempolar(occ,timescale = 'm')

Now this created the following graph:

monthly temporal plot

So now this is what we expected to have as a figure. One final thing is to add a better title.

tempolar(occ,timescale = 'm', color = "blue",
         title = 'Pattern of accumulation of records
                  of Indian Birds by month')

final tempolar

 

Currently the tempolar does not have ability to display values for each month. Is that very important and needs to be added? We would like to hear form the users.

If you have suggestions on improving the features of package bdvis please leave comments Github repository.

References

Creating figures like the paper ‘Completeness of Digital Accessible Knowledge of Plants of Ghana’ Part 1

27 Oct

Recently I got to read the paper about Completeness of Digital Accessible Knowledge DAK by Alex Asase and A. Townsend Peterson. I really enjoyed reading the paper and liked the way the figures are presented. There is a lot of overlap of this with my work on package bdvis (of course under guidance of Town Peterson). So I thought I will share some code snippets to recreate figures similar to the ones in the paper using package bdvis.

Since I do not have the copy of the data in the paper, I am using data downloaded from GBIF website. I decided to use Birds data for India.

To create Figure 1a. Graph showing accumulation of records through time (years) we need to set the data in bdvis format and then use function distrigraph.

library(bdvis)

# Download GBIF data from data.gbif,org portal and
# extract occurrence.txt file in Data folder
occ <- read.delim( 'verbatim.txt',
                          quote='', stringsAsFactors=FALSE)
# Construct Date field form day, month, year
occ$Date_collected <- as.Date( paste( occ$year,
                                      occ$month ,
                                      occ$day , sep = "." ),
                               format = "%Y.%m.%d" )
# Set configuration variables to format data
conf <- list(Latitude='decimalLatitude',
             Longitude='decimalLongitude',
             Date_collected='Date_collected',
             Scientific_name='specificEpithet')
occ <- format_bdvis(occ, config=conf) occ_date=occ[occ$Date_collected > as.Date("1500-01-01") &
           occ$Date_collected < as.Date("2017-01-01") &
           !is.na(occ$Date_collected) ,]
distrigraph(occ_date, ptype="efforts", type="h")

Now this created the following graph:

BirdDistriPlot1

The graph shows what we wanted to show, but we would like to modify this a bit to look more that the Figure in the paper. So let us exclude some more data and change the color and width of the lines in the graph.

occ_date1 <- occ[occ$Date_collected > as.Date("1900-01-01") &
               occ$Date_collected < as.Date("2015-01-01") &
               !is.na(occ$Date_collected) ,]
distrigraph(occ_date1, ptype="efforts", col="red",
            type="h", lwd=3)

Now this created the following graph:

BirdDistriPlot2

References

Visualize completeness of biodiversity data

10 Jun Completeness Visualization

Package bdvis: Biodiversity data visualizations using R is helpful to understand completeness of biodiversity inventory, extent of geographical, taxonomic and temporal coverage, gaps and biases in data. Package bdvis version 0.2.6 is on CRAN now. This version has several features added since version 0.1.0. I plan to post set of blog entries here to describe some of the key features of the package with some code snippets.

The function bdcomplete computes completeness values for each cell. So after dividing the extent of the dataset in cells (via the getcellid function), this function calculates the Chao2 estimator of species richness. In simple terms, the function estimates looking at the data records in each cell and how many species are represented, how complete that dataset.

The following code snippet shows how the data downloaded from Global Biodiversity Information Facility GBIF Data Portal. The .zip file downloaded using the portal has a file occurrence.txt which contains the data records. Copy that file in the working folder and try the following script.

library(bdvis)

# Download GBIF data from data.gbif,org portal and
# extract occurrence.txt file in Data folder
occurrence &lt;- read.delim( 'occurrence.txt',
                         quote='', stringsAsFactors=FALSE)
# Set configuration variables to format data
conf &lt;- list(Latitude='decimalLatitude',
             Longitude='decimalLongitude',
             Date_collected='eventDate',
             Scientific_name='specificEpithet')
occurrence &lt;- format_bdvis(occurrence, config=conf)
# Compute completeness and visualize using mapgrid
comp=bdcomplete(occurrence)
mapgrid(comp,ptype='complete')

The completeness function produces a graph showing Completeness vs number of Species. More points in higher range of completeness indices indicates better data.

 Completeness vs Species

Completeness vs Species

Now to visualize the data spatially, if any particular region needs better sampling the function mapgrid can now be used with ptype = “complete” parameter. This plots all the grids that have data records more than recs parameter (default = 50) using a color range from light purple to dark blue. Darker the color better the data in that cell.

Completeness Visualization

Completeness Visualization

References: