Tag Archives: rgbif

GSoC Proposal 2014: package bdvis: Biodiversity Data Visualizations

17 Mar

Update: The proposal has been approved for participation in Google Summer of Code 2014. I will post updates on the progress on the blog once the coding phase starts.

I am applying for Google Summer of Code 2014 again with “Biodiversity Data Visualizations using R” proposal. We are proposing to take package bdvis to next level by adding more functions and making it available through CRAN. I am posting this idea to get feedback and suggestions from Biodiversity Informatics community.

[During next few days I will keep updating this to accommodate suggestions. The example visualizations here are crude examples of the ideas, and need lot of work to convert them into reusable functions.]

Background

Package bdvis is already under development and was successful projects in GSoC 2013. As of now the package has basic functionality to perform biodiversity data visualizations, but with growing user base for the package, requests for additional features are coming up. We propose to add the user requested functionality and implement some new functions to take bdvis to next level. Following are the major tasks of proposed project.

  1. Fix currently reported bugs and complete documentation to submit package to CRAN.
  2. Implementation of additional features requested by users.
  3. Develop seamless data support.
  4. Additional functions for visualizations.
  5. Prepare detailed vignette.

User requested features

The features and functionality requested by users so far are the following:

  • A versatile function to subset the data based on taxonomy for a species, genus, family etc. or date like a particular year or range of years and so on.
  • Tempolar ability to show average records per day/week/month rather than just raw numbers currently
  • Taxotree additional parameters to control the diagram like Title, Legend, Colors. Also to add ability to choose summary based on number of records, number of species or higher taxonomy
  • bdsummary number of grid cells covered by data records and % of coverage of the bounding box
  • Visualisation ability for the output of completeness analysis bdcomplete function
  • Improve gettaxo efficiency by adding ability to search by genus rather than current scientific name. This could be added as an option in case user needs to search by full scientific names for some reason.

Data formats support

Develop functions for seamless support for major available Biodiversity occurrence data formats in R environment to work with bdvis package. Preliminary list of packages that make data available are rgbif, rvertnet, rinat, spocc. Get feedback from user community for additional data sources they might be using and incorporate them into the worklist.

Additional visualizations

    • Distribution of collection efforts over time (line graph) [Fig 1 Soberon et al 2000]

Soberon_Fig_1

    • Distribution of number of records among taxon, cells (histogram) [Fig 3,4 Soberon et al 2000]

Soberon_Fig_3

  • Distribution of number of species among cells (histogram) [Fig 5 Soberon et al 2000]
  • Completeness vs number of species(scatterplot) [Fig 6 Soberon et al 2000]
  • Record densities for day of year and week of year [Otegui 2012]

RecordsPerDayofYear

  • Records per year dot plots [Otegui 2012]

RecPerYear

  • calenderHeat maps of number of records or species recorded

IndianMoths_calenderheat

Interactive Map of records

A function to plot records on an interactive map. The plan is to develop a function that will generate a geoJSON based map using a html / java script file. User can open the file in web browser to explore the records. Considering the performance we might have to restrict number of records for this function.

geoJSON example screenshot

Vignette preparation

Prepare test data sets for the vignette. Three data sets one with global geographical coverage and wide species coverage, second with country level geographical and Class or Order level species coverage and final narrow species selection may be at genus level to demonstrate functionality. Write up code and explanation of each of the function in package, add result tables, graphs and maps to complete the vignette.

References

  • Otegui, J., & Ariño, A. H. (2012). BIDDSAT: visualizing the content of biodiversity data publishers in the Global Biodiversity Information Facility network. Bioinformatics (Oxford, England), 28(16), 2207–8. doi:10.1093/bioinformatics/bts359
  • Soberón, J., Llorente, J., & Oñate, L. (2000). The use of specimen-label databases for conservation purposes: an example using Mexican Papilionid and Pierid butterflies. Biodiversity and Conservation, 9(Roman 1997), 1441–1466. Retrieved from http://www.springerlink.com/index/H58022627013233W.pdf

GSoC Proposal 2013: Biodiversity Visualizations using R

29 Apr

I am applying for Google Summer of Code 2013 with this “Biodiversity Visualizations using R” proposal. I am posting this idea to get feedback and suggestions from Biodiversity Informatics community.

[During next few days I will keep updating this to accommodate suggestions. The example visualizations here are crude examples of the ideas, and need lot of work to convert them into reusable functions.]

Backgrouond

R is increasingly being used in Biodiversity information analysis. There are several R packages like rgbif and rvertnet in rOpenSci suite to query, download and to some extent analyse the data within R workflow. We also have packages like dismo and SDMTools for modelling the data. It will be useful to have a package to quickly visualize biodiversity data. These visualizations would be helpful to understand extent of geographical, taxonomic and temporal coverage, gaps and biases in data.

The proposal is to work on a R package to provide functionality to quickly generate the visualizations of the data set user has gathered or generated.

The functions provided would be for following tasks:

  • Data preparation – The data needs to be converted into suitable format for visualizations and analysis i.e. date format, taxonomic classification and geographical co-ordinates should be in uniform and usable formats.
  • Data summary: Function(s) to quickly summarize the data set telling user number of records, number of records with Lat Long values, Bounding box of Lat Long Values, Date range and so on.
  • Geographic coverage – functions to visualize the data points on maps, density maps at different scales like Country level, Degree grid and so on.
Density of the records worldwide

Density of the records worldwide. Darker color indicates higher density of records.

Temporal coverage of the records

Temporal coverage of the records. Each line represents number of records on that particular day.

  • Taxonomic coverage – functions to visualize the taxonomic coverage of data in Tree Map formats by Number of records per species and number of species covered.
Familywise records

Family wise records present in the data set. (White block indicates records with unassigned family)

  • Completeness analysis – functions to assess and visualize completeness of biodiversity inventory of the region or in other words a measure of how exhaustive is the sampling in the study area [Ref:http://dx.doi.org/10.1111/j.0906-7590.2007.04627.x ]

Mentor(s): Javier Otegui

Data set: The data set used for the sample visualizations here is records published by iNaturalist.org on GBIF data portal. This data set contains Research Grade records (~46K) for all the organisms posted. The details of the data set are available here. The description on GBIF dat postal says “iNaturalist.org is a website where anyone can record their observations from nature. Members record observations for numerous reasons, including participation in citizen science projects, class projects, and personal fulfillment.”

References:

  • Chamberlain, S., & Barve, V. (2012). rvertnet: Search VertNet database from R. Retrieved from http://cran.r-project.org/package=rvertnet
  • Chamberlain, S., Boettiger, C., Ram, K., & Barve, V. (2013). rgbif: Interface to the Global Biodiversity Information Facility API methods. Retrieved from http://cran.r-project.org/package=rgbif
  • Hijmans, R. J., Phillips, S., Leathwick, J., & Elith, J. (2012). dismo: Species distribution modeling. Retrieved from http://cran.r-project.org/package=dismo
  • Otegui, J., & Ariño, A. H. (2012). BIDDSAT: visualizing the content of biodiversity data publishers in the Global Biodiversity Information Facility network. Bioinformatics (Oxford, England), 28(16), 2207–8. doi:10.1093/bioinformatics/bts359
  • Soberón, J., Jiménez, R., Golubov, J., & Koleff, P. (2007). Assessing completeness of biodiversity databases at different spatial scales. Ecography, 30(1), 152–160. doi:10.1111/j.2006.0906-7590.04627.x
  • VanDerWal, J., Falconi, L., Januchowski, S., Shoo, L., & Storlie, C. (2012). SDMTools: Species Distribution Modelling Tools: Tools for processing data associated with species distribution modelling exercises. Retrieved from http://cran.r-project.org/package=SDMTools

Map biodiversity records with rgbif and ggmap packages in R

23 Jul

When I attended usrR! 2012 last month, there was an interesting presentation by Dr. David Kahle about the package ggmap. It is a package built over ggmap2 and helps us map spatial data over online maps like Google maps or Open Street Maps. I decided to give ggmap package a try with biodiversity data.

So first let us create a map for the Plain Tiger or the African Monarch Butterfly (Danaus chrysippus). We use occurrencelist from rgbif package again like previous post.

We use qmap function from ggmap package to quickly pull up the base map from Google Maps. So in essence the qmap function eliminates two step process of getting map data using map_data function and then setting up map display using ggplot function into one step. We use geom_jitter function to plot the occurrence points in the specified size(size = 4) and color(color = “red”).

library(rgbif)
Dan_chr=occurrencelist(sciname = 'Danaus chrysippus',
                       coordinatestatus = TRUE,
                       maxresults = 1000,
                       latlongdf = TRUE, removeZeros = TRUE)
library(ggmap)
library(ggplot2)
wmap1 = qmap('India',zoom=2)
wmap1 +
      geom_jitter(data = Dan_chr,
                  aes(decimalLongitude, decimalLatitude),
                  alpha=0.6, size = 4, color = "red") +
                    opts(title = "Danaus chrysippus")

Here is the opuput map of the code snippet:

Though in earlier code we have used geom_jitter, high density of the points in some regions are not clearly seen. If we want to get better idea about the number of points we can try two dimensional density maps using the stat_density2d function. It just adds density lines on the map showing higher density with darker circles.

library(rgbif)
Dan_chr=occurrencelist(sciname = 'Danaus chrysippus',
                       coordinatestatus = TRUE,
                       maxresults = 1000,
                       latlongdf = TRUE, removeZeros = TRUE)
library(ggmap)
library(ggplot2)
wmap1 = qmap('India',zoom=2)
wmap1 +
  stat_density2d(aes(x = decimalLongitude, y = decimalLatitude,
                     fill = ..level.., alpha = ..level..),
                 size = 4, bins = 6,
                 data = Dan_chr, geom = 'line') +
      geom_jitter(data = Dan_chr,
                  aes(decimalLongitude, decimalLatitude),
                  alpha=0.6, size = 4, color = "red") +
                    opts(title = "Danaus chrysippus :: Density Plot")

Map biodiversity records with rgbif and dismo packages in R

16 Jul

In the earlier post we generated maps from GBIF biodiversity records using maps and ggplot2 packages. We used world map with country borders for that. Now we will generate maps with google maps as base layer using dismo package.

Like earlier we download data for Danaus chrysippus from GBIF using occurrencelist function into a data frame Dan_chr.

Then use dismo package which has function gmap to quickly download base layer maps form google and display it using plot function. We can specify the extent of map range we need to download using extent function and specifying Latitude and Longitude range. We plot the points first by converting them into Mercator system using points.

library(rgbif)
Dan_chr=occurrencelist(sciname = 'Danaus chrysippus',
                       coordinatestatus = TRUE,
                       maxresults = 1000,
                       latlongdf = TRUE, removeZeros = TRUE)
library(dismo)
e = extent( -179 , 179 , -80 , 80 )
r = gmap(e)
plot(r, interpolate=TRUE, main="Map")
xy1=cbind(Dan_chr$decimalLongitude,Dan_chr$decimalLatitud)
points(Mercator(xy1) , col='blue', pch=20)
text(160,0, "\n\n\nDanaus \nchrysippus", adj = c(0,1),
     col="red")

The output of the code snippet is as follows:

Map of Danaus chrysippus

Map biodiversity records with rgbif, maps and ggplot2 packages in R

9 Jul

Global Biodiversity Information Facility or GBIF is an international consortium working towards making Biodiversity information available through single portal to everyone.  GBIF with its partners are working towards mobilizing data, developing data and metadata standards, developing distributed database system and making the data accessible through APIs. At this point this largest single window data source covering wide spectrum of taxa and geographic range.

rgbif is a package in R to download the data from GBIF data portal using its API. Once the data is available as data frame in R, we can use several functions and packages to analyse and visualize it. [ Cran, rOpenSci, my work ]

Here first we use occurrencelist function to download 1000 records (maxresults = 1000) for “Danaus plexipus”  (sciname = ‘Danaus plexippus’) which is Monarch Butterfly. We specify that we want records that have been geo-coded (coordinatestatus=TRUE), we want it to be stored in data frame (latlongdf=TRUE) and we want to remove any records that have zeros in Latitude and Longitude values (removeZeros = TRUE). This command will result in a data frame dan_ple with monarch occurrence records.

Now we use map_data function form maps package to get world map. Use ggplot function form ggplot2 package to plot the world map as polygon layer. On top of that we plot the Monarch butterfly occurrence points using decimalLatitude and decimalLongitude columns in red color. and specify the title of the map.

library(rgbif)
dan_ple=occurrencelist(sciname = 'Danaus plexippus', 
                       coordinatestatus = TRUE, maxresults = 1000, 
                       latlongdf = TRUE, removeZeros = TRUE)
library(maps)
library(ggplot2)
world = map_data("world")
ggplot(world, aes(long, lat)) +
geom_polygon(aes(group = group), fill = "white", 
              color = "gray40", size = .2) +
geom_jitter(data = dan_ple,
aes(decimalLongitude, decimalLatitude), alpha=0.6, 
             size = 4, color = "red") +
opts(title = "Danaus plexippus")

The final output of the code snippet is as following.

Map of Danaus plexippus

More maps using other packages to come soon.