Like it happens in any programming language, functions enable to enclose related lines of code into a reusable block. Values can be passed to the functions in the form of parameters and received from the functions using return values.
In R, function have the following syntax:
function.name <- function( param1, param2, param3 = default.value) { #Statements involving params and required logic followed by the return statement return (result); }
Where
- Parameters are optional
- Parameters can have default values
- Return statement is optional
Let’s Look at an Example
Assuming that you have some familiarity with functions and you have gone through my previous articles on Vectors and Lists, the following example, prints skill set from a given list of skill sets:
fullstack.webdev <- c('React.js', 'Node.js', 'PostgreSQL', 'Redux') erp.modules <- c('Manufacturing', 'Accounts', 'HR', 'Finance', 'Payroll', 'Procurements', 'Sales') qa.tools <- c('Selenium', 'Karma', 'Protractor', 'Jasmine', 'jMeter') skills.list <- list(webdev = fullstack.webdev, erp = erp.modules, QA = qa.tools) getSkills <- function(random.index = 1) { skills <- skills.list[[random.index]] return(skills) } known.skills <- getSkills( sample(1:3, 1) ) cat('Known skills are ', known.skills, '.\nShe also knows ', getSkills())
When you run above code, you get following output (which of course depends on the random.index value:
> source('~/fun/functionBasics.R') Known skills are Manufacturing Accounts HR Finance Payroll Procurements Sales . She also knows React.js Node.js PostgreSQL Redux
What did we do?
- We create a List, consisting of vectors (more of data preparation to make the example meaningful)
- Decided to keep one parameter, with a default value of 1
- In case you have more than one parameters then you can put the parameters with the default values after the non-default parameters
- During function execution, the parameter mapping will take place from left to right and if you miss a parameter then the value may get assigned to undesired parameter
- Using the parameter value and double square notation, accessed the list element
- Returned the list element to the caller
- The return value is wrapped inside the parentheses
- Called the function and stored the result in a variable
- Using cat function, combined the fixed string with the variable to print the same on the console
- Also, called the function inside the cat function without any parameter, where it used the default value of the parameter.
Invoking a function
You call a function using the closed parentheses. Of course, you do need to pass parameters as required. Example code in the previous section did show that.
However, if you don’t put a parenthesis around the function name then it just prints that specific function.
Following examples shows when it shows the function definition and when it goes ahead and executes the function:
> getSkills function(random.index = 1) { skills <- skills.list[[random.index]] return(skills) } > getSkills(3) [1] "Selenium" "Karma" "Protractor" "Jasmine" "jMeter"
Understanding Scope in R
In the example function in previous section, we accessed a list, skills.list. This list is actually defined in global scope and hence it is available for all the functions.
However, the local variable (the skills variable) inside the function is accessible only inside this function. When you try to access this variable outside the function then it will give an error, as shown below:
> skills Error: object 'skills' not found
Global variable inside a function
The global variables can be updated inside a function. However, unlike other languages (where the update affects global variable from that point onwards), the effect of the update will be visible only inside the function.
Let’s take a look at the following code and its outcome:
fullstack.webdev <- c('React.js', 'Node.js', 'PostgreSQL', 'Redux') erp.modules <- c('Manufacturing', 'Accounts', 'HR', 'Finance', 'Payroll', 'Procurements', 'Sales') qa.tools <- c('Selenium', 'Karma', 'Protractor', 'Jasmine', 'jMeter') skills.list <- list(webdev = fullstack.webdev, erp = erp.modules, QA = qa.tools) getSkills <- function(random.index = 1) { skills <- skills.list[[random.index]] new.skillset <- c('R', 'Python', 'AI', 'ML', 'Data Science') skills.list[[length(skills.list)+1]] <- new.skillset cat('\n #of element in the list inside function definition' , length(skills.list)) return(skills) } cat('#of element in the list before function call' , length(skills.list)) known.skills <- getSkills( sample(1:3, 1) ) cat('\n #of element in the list after function call' , length(skills.list))
It produces following output, which indicates that before and after the function call, the number of elements in the list is same, while during the function call it increased by one:
> source('~/fun/functionBasics.R') #of element in the list before function call 3 #of element in the list inside function definition 4 #of element in the list after function call 3
Scope Resolution inside a function
When a function encounters a variable inside the function, it checks in the following order
- Is it defined locally
- If you define a parameter locally or change its value locally, the changed value becomes available from that point onwards.
- Is it part of the parameter
- Is it part of the global variable
- Is it undefined
Following example code and output shows how changing the parameter value or global variable behaves inside a function:
fullstack.webdev <- c('React.js', 'Node.js', 'PostgreSQL', 'Redux') erp.modules <- c('Manufacturing', 'Accounts', 'HR', 'Finance', 'Payroll', 'Procurements', 'Sales') qa.tools <- c('Selenium', 'Karma', 'Protractor', 'Jasmine', 'jMeter') skills.list <- list(webdev = fullstack.webdev, erp = erp.modules, QA = qa.tools) getSkills <- function(random.index = 1) { new.skillset.ds <- c('R', 'Python', 'AI', 'ML', 'Data Science') new.skillset.bd <- c('Hadoop', 'Spark', 'Zookeeper', 'Cloudera') skills.list <- list(new.skillset.ds, new.skillset.bd) print(skills.list) random.index <- 2 skills <- skills.list[[random.index]] return(skills) } known.skills <- getSkills(1) cat('\nKnown skills are ', known.skills)
The output looks like below:
> source('~/fun/functionBasics.R') [[1]] [1] "R" "Python" "AI" "ML" "Data Science" [[2]] [1] "Hadoop" "Spark" "Zookeeper" "Cloudera" Known skills are Hadoop Spark Zookeeper Cloudera
Note
- Even though we wanted an item at index 1, the result is showing the item at index 2, because the parameter value was assigned a different value inside the function
- The global variable was reassigned inside the function and hence it has only two elements and those elements are new elements
Updating a global variable inside a function
Previously we saw that the change in the global variable through the assignment operator was effective only inside that function. Hence, before and after the function call, the number of elements in list was not changing. However, using the <<- operator, you can change the global variable.
Following example shows the change in global variable:
fullstack.webdev <- c('React.js', 'Node.js', 'PostgreSQL', 'Redux') erp.modules <- c('Manufacturing', 'Accounts', 'HR', 'Finance', 'Payroll', 'Procurements', 'Sales') qa.tools <- c('Selenium', 'Karma', 'Protractor', 'Jasmine', 'jMeter') skills.list <- list(webdev = fullstack.webdev, erp = erp.modules, QA = qa.tools) getSkills <- function(random.index = 1) { new.skillset.ds <- c('R', 'Python', 'AI', 'ML', 'Data Science') new.skillset.bd <- c('Hadoop', 'Spark', 'Zookeeper', 'Cloudera') skills.list[[length(skills.list) + 1]] <<- list(new.skillset.ds, new.skillset.bd) cat('\n#of element in the list inside function call' , length(skills.list)) skills <- skills.list[[random.index]] return(skills) } cat('\n #of element in the list before function call' , length(skills.list)) known.skills <- getSkills(sample(1:3, 1)) cat('\n #of element in the list after function call' , length(skills.list))
The output is following:
> source('~/fun/functionBasics.R') #of element in the list before function call 3 #of element in the list inside function call 4 #of element in the list after function call 4
As you can see, after the function execution, the item added inside the function is accessible in the global scope as well.
Note
- <- always creates a binding in the current environment; <<- rebinds an existing name in a parent of the current environment
- You must be very careful while defining variables in global scope
- You must pay very close attention to any update in the global variable
Anonymous Functions
As the name suggest, the anonymous functions are the functions without any name. Many times you may need to define a function for using in a single place. In such cases, it may be efficient to define this inline without defining them separately.
For example – in built functions like sapply, lapply, etc. allows you to apply a function on a data set. Say, you want to calculate cube of each element of a vector then you have two options
- Define a function called cube and pass that function as parameter to sapply or
- Define that function as anonymous function in the parameter itself
Version 1
vec <- 1:10 cube <- function(num) { return(num * num * num) } print(sapply(vec, cube))
Version 2
vec <- 1:10 print(sapply(vec, function(num) {num * num * num}))
Note that you don’t need to mention name and also you don’t need to specify return statement.
In both the cases, the outcome will be following:
> source('~/fun/anonymousFun.R') [1] 1 8 27 64 125 216 343 512 729 1000
When the function is very simple then then it makes sense to use Anonymous function. Otherwise, defining explicitly and then using it makes it more manageable.
Additional Resources