Working with Functions in R

Like it happens in any programming language, functions enable to enclose related lines of code into a reusable block. Values can be passed to the functions in the form of parameters and received from the functions using return values.

In R, function have the following syntax:

function.name <- function( param1, param2, param3 = default.value)  {
      #Statements involving params and required logic followed by the return statement
return (result);
}

Where

  • Parameters are optional
  • Parameters can have default values
  • Return statement is optional

Let’s Look at an Example

Assuming that you have some familiarity with functions and you have gone through my previous articles on Vectors and Lists, the following example, prints skill set from a given list of skill sets:

fullstack.webdev <- c('React.js', 'Node.js', 'PostgreSQL', 'Redux')
erp.modules <- c('Manufacturing', 'Accounts', 'HR', 'Finance', 'Payroll', 'Procurements', 'Sales')
qa.tools <- c('Selenium', 'Karma', 'Protractor', 'Jasmine', 'jMeter')
skills.list <- list(webdev = fullstack.webdev, erp = erp.modules, QA = qa.tools)

getSkills <- function(random.index = 1) {
  skills <- skills.list[[random.index]]
  return(skills)
}
known.skills <- getSkills( sample(1:3, 1) )
cat('Known skills are ',  known.skills, '.\nShe also knows ', getSkills())

When you run above code, you get following output (which of course depends on the random.index value:

> source('~/fun/functionBasics.R')
Known skills are  Manufacturing Accounts HR Finance Payroll Procurements Sales .
She also knows  React.js Node.js PostgreSQL Redux

 

What did we do?

  • We create a List, consisting of vectors (more of data preparation to make the example meaningful)
  • Decided to keep one parameter, with a default value of 1
    • In case you have more than one parameters then you can put the parameters with the default values after the non-default parameters
    • During function execution, the parameter mapping will take place from left to right and if you miss a parameter then the value may get assigned to undesired parameter
  • Using the parameter value and double square notation, accessed the list element
  • Returned the list element to the caller
    • The return value is wrapped inside the parentheses
  • Called the function and stored the result in a variable
  • Using cat function, combined the fixed string with the variable to print the same on the console
    • Also, called the function inside the cat function without any parameter, where it used the default value of the parameter.

 

Invoking a function

You call a function using the closed parentheses. Of course, you do need to pass parameters as required. Example code in the previous section did show that.

However, if you don’t put a parenthesis around the function name then it just prints that specific function.

Following examples shows when it shows the function definition and when it goes ahead and executes the function:

> getSkills
function(random.index = 1) {
  skills <- skills.list[[random.index]]
  return(skills)
}

> getSkills(3)
[1] "Selenium"   "Karma"      "Protractor" "Jasmine"    "jMeter" 

 

Understanding Scope in R

In the example function in previous section, we accessed a list, skills.list. This list is actually defined in global scope and hence it is available for all the functions.

However, the local variable (the skills variable) inside the function is accessible only inside this function. When you try to access this variable outside the function then it will give an error, as shown below:

> skills
Error: object 'skills' not found

 

Global variable inside a function

The global variables can be updated inside a function. However, unlike other languages (where the update affects global variable from that point onwards), the effect of the update will be visible only inside the function.

Let’s take a look at the following code and its outcome:

fullstack.webdev <- c('React.js', 'Node.js', 'PostgreSQL', 'Redux')
erp.modules <- c('Manufacturing', 'Accounts', 'HR', 'Finance', 'Payroll', 'Procurements', 'Sales')
qa.tools <- c('Selenium', 'Karma', 'Protractor', 'Jasmine', 'jMeter')
skills.list <- list(webdev = fullstack.webdev, erp = erp.modules, QA = qa.tools)
getSkills <- function(random.index = 1) {
  skills <- skills.list[[random.index]]
  new.skillset <- c('R', 'Python', 'AI', 'ML', 'Data Science')
  skills.list[[length(skills.list)+1]] <- new.skillset
  cat('\n #of element in the list inside function definition' , length(skills.list))
  return(skills)
}
cat('#of element in the list before function call' , length(skills.list))
known.skills <- getSkills( sample(1:3, 1) )
cat('\n #of element in the list after function call' , length(skills.list))

It produces following output, which indicates that before and after the function call, the number of elements in the list is same, while during the function call it increased by one:

> source('~/fun/functionBasics.R')
 #of element in the list before function call 3
 #of element in the list inside function definition 4
 #of element in the list after function call 3

Scope Resolution inside a function

When a function encounters a variable inside the function, it checks in the following order

  1. Is it defined locally
    1. If you define a parameter locally or change its value locally, the changed value becomes available from that point onwards.
  2. Is it part of the parameter
  3. Is it part of the global variable
  4. Is it undefined

Following example code and output shows how changing the parameter value or global variable behaves inside a function:

fullstack.webdev <- c('React.js', 'Node.js', 'PostgreSQL', 'Redux')
erp.modules <- c('Manufacturing', 'Accounts', 'HR', 'Finance', 'Payroll', 'Procurements', 'Sales')
qa.tools <- c('Selenium', 'Karma', 'Protractor', 'Jasmine', 'jMeter')
skills.list <- list(webdev = fullstack.webdev, erp = erp.modules, QA = qa.tools)
getSkills <- function(random.index = 1) {
  new.skillset.ds <- c('R', 'Python', 'AI', 'ML', 'Data Science')
  new.skillset.bd <- c('Hadoop', 'Spark', 'Zookeeper', 'Cloudera')
  skills.list <- list(new.skillset.ds, new.skillset.bd)
  print(skills.list) 
  random.index <- 2
  skills <- skills.list[[random.index]]
  return(skills)
}

known.skills <- getSkills(1)
cat('\nKnown skills are ',  known.skills)

 

The output looks like below:

> source('~/fun/functionBasics.R')
[[1]]
[1] "R"            "Python"       "AI"           "ML"           "Data Science"
[[2]]
[1] "Hadoop"    "Spark"     "Zookeeper" "Cloudera" 
Known skills are  Hadoop Spark Zookeeper Cloudera

Note

  • Even though we wanted an item at index 1, the result is showing the item at index 2, because the parameter value was assigned a different value inside the function
  • The global variable was reassigned inside the function and hence it has only two elements and those elements are new elements

 

Updating a global variable inside a function

Previously we saw that the change in the global variable through the assignment operator was effective only inside that function. Hence, before and after the function call, the number of elements in list was not changing. However, using the <<- operator, you can change the global variable.

Following example shows the change in global variable:

fullstack.webdev <- c('React.js', 'Node.js', 'PostgreSQL', 'Redux')
erp.modules <- c('Manufacturing', 'Accounts', 'HR', 'Finance', 'Payroll', 'Procurements', 'Sales')
qa.tools <- c('Selenium', 'Karma', 'Protractor', 'Jasmine', 'jMeter')
skills.list <- list(webdev = fullstack.webdev, erp = erp.modules, QA = qa.tools)
getSkills <- function(random.index = 1) {
  new.skillset.ds <- c('R', 'Python', 'AI', 'ML', 'Data Science')
  new.skillset.bd <- c('Hadoop', 'Spark', 'Zookeeper', 'Cloudera')
  skills.list[[length(skills.list) + 1]] <<- list(new.skillset.ds, new.skillset.bd)
  cat('\n#of element in the list inside function call' , length(skills.list))
  skills <- skills.list[[random.index]]
  return(skills)
}

cat('\n #of element in the list before function call' , length(skills.list))
known.skills <- getSkills(sample(1:3, 1))
cat('\n #of element in the list after function call' , length(skills.list))

The output is following:

> source('~/fun/functionBasics.R')
 #of element in the list before function call 3
 #of element in the list inside function call 4
 #of element in the list after function call 4

As you can see, after the function execution, the item added inside the function is accessible in the global scope as well.

Note

  • <- always creates a binding in the current environment; <<- rebinds an existing name in a parent of the current environment
  • You must be very careful while defining variables in global scope
  • You must pay very close attention to any update in the global variable

Anonymous Functions

As the name suggest, the anonymous functions are the functions without any name. Many times you may need to define a function for using in a single place. In such cases, it may be efficient to define this inline without defining them separately.

For example – in built functions like sapply, lapply, etc. allows you to apply a function on a data set. Say, you want to calculate cube of each element of a vector then you have two options

  1. Define a function called cube and pass that function as parameter to sapply or
  2. Define that function as anonymous function in the parameter itself

 

Version 1

vec <- 1:10
cube <- function(num) {
  return(num * num * num)
}
print(sapply(vec, cube))

Version 2

vec <- 1:10
print(sapply(vec, function(num) {num * num * num}))

Note that you don’t need to mention name and also you don’t need to specify return statement.

In both the cases, the outcome will be following:

> source('~/fun/anonymousFun.R')
 [1]    1    8   27   64  125  216  343  512  729 1000

When the function is very simple then then it makes sense to use Anonymous function. Otherwise, defining explicitly and then using it makes it more manageable.

Additional Resources

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s