Working with Vectors in R

A vector is a sequence of data elements of the same basic data types. In a single vector, you cannot mix different data types. However, if you do so, R will decide the most appropriate data types at runtime.

Note

In R, you will find the following basic data types :

  • Numeric (e.g. 2, 2.5, -45.4, etc)
  • Integer (e.g. 1L, 5L, -10L, etc)
  • Character (e.g. ‘R’, ‘Professionals’, etc)
  • Logical (e.g. TRUE, FALSE, T, F)
  • Complex (e.g. 5+3i, where i is the imaginary component)

Creating a vector

c-function gets used to combine the elements, lists or vectors into a vector. By using the assignment operator, <-, you can create initial vector as shown below:

student.grades <- c(9, 9.1, 8.1, 9.5, 9.45)
print(student.grades)

The output looks like below.(Make sure to have no blank lines in between else you get an error)

[1] 9.00 9.10 8.10 9.50 9.45

Creating a vector with potentially mixed data types

When you create a vector with mixed data types, as shown, below then

mixed.datatypes <- c(TRUE, F, 10, 'Skill', 9L, 8+5i)
print(mixed.datatypes)

It gives following output:

[1] "TRUE"  "FALSE" "10"    "Skill" "9"  "8+5i" 

Further, when you try to know the data types of the vector,

class(mixed.datatypes)

It does indicate that it has converted rest of the elements into character data types

[1] "character"

Naming a vector

While c function allows us to combine elements, it will be good to have a name against the data to make it more meaningful. The names function allows us to achieve the same.

When you execute following statements, the name gets assigned to the vector elements:

names(student.grades) <- c('Aayush', 'Pratyush', 'John', 'Alisha', 'Peter')

The output looks as shown below:

r6

Note

The names attribute must have the same length as the vector name.

  • In case you give more names than desired, it will give you error which will look like below:
    • Error in names(student.grades) <- c(“Aayush”, “Pratyush”, “John”, “Alisha”,  :   ‘names’ attribute [6] must be the same length as the vector [5]
  • In case you give less number of names than the vector length then it considers <NA> as value for the missing names

r7

Alternately, we can first define the names (in our example student names) and then assign those names to the vector. Following example demonstrates the same:

> student.names <-  c('Aayush', 'Pratyush', 'John', 'Alisha', 'Peter')
> names(student.grade) <- student.names
> print(student.grade)
  Aayush Pratyush     John   Alisha    Peter 
    9.00     9.10     8.10     9.50     9.45 

Accessing vectors

Vectors can be accessed using index as well as name.

Accessing using Names and Indexes

The following example shows the first element as well as the element with the name “John”:

> student.grades["John"]
John 
 8.1 
> student.grades[1]
Aayush 
     9 

Further, you can apply functions like sort to get the sorted names and corresponding values. Following example shows one such application:

> student.names <- c('Aayush', 'Pratyush', 'John', 'Alisha', 'Peter')
> student.grades <- c(9, 9.1, 8.1, 9.5, 9.45)
> names(student.grades) <- student.names
> print(student.grades[sort(student.names)])
  Aayush   Alisha     John    Peter Pratyush 
    9.00     9.50     8.10     9.45     9.10 

Note

  • Unlike arrays (which is 0-based) in some languages, the vector indexes are 1-based. What it means is that the first element will be at index 1.

Accessing using negative index

While this sounds unusual, you can indeed access vector using the negative index. In that case the corresponding absolute position element gets removed from the vector and rest of the elements will be made available.

For example, in below example, the accessing the vector using “-4” removes the fourth element and makes rest of the vector accessible to the user:

> student.names <- c('Aayush', 'Pratyush', 'John', 'Alisha', 'Peter')
> student.grades <- c(9, 9.1, 8.1, 9.5, 9.45)
> names(student.grades) <- student.names

r8

Accessing out-of-range index

As you would expect, the out-of-range index shall not return anything. When you try to access an out of range element, then it gives following output, which shows NA:

> print(student.grades[6])
<NA> 
  NA 

Accessing more than one element in the order of your choice

There may be situations where you may need to access the same element more than once or few elements in a given order. You can make use of the combine (c) function to achieve this specific need. Following example demonstrates the same:

r9

Vector Operations

Adding two vectors

When you add two vectors with same data types, you may have following situations

  • Vectors are of equal length or
  • They are of different lengths

When they are of equal length, the element values in the same index gets added. However, if the two vectors are of different lengths then the shorter vector starts cycling its element.

Let’s look at these two situations through an example.

Same length vectors

Let’s look at the value of below the student.netscore, where we have two equal length vectors

student.math.grade <- c(9.5, 9.4, 9.1, 9.8, 9.7)
student.science.grade <- c(9.1, 9.6, 8.5, 8.8, 8.7)
student.netscore <- student.math.grade + student.science.grade
names(student.netscore) <- student.names

As expected, in the output, you see that the numbers for science and maths have been added:

r10

Different Length Vectors

Let’s add two vectors of different lengths and notice the outcome. In below example,

student.wholeclass.maths <- c(9.5, 9.4, 9.1, 9.8, 7.7, 6.5, 6.4, 6.1, 7.8, 6.7) 
student.netscore <- student.netscore + student.wholeclass.maths
names(student.netscore) <- student.names

After executing above statements, you would see following output, which clearly indicates that student.netscore elements are getting repeated (i.e. recycling happens):

r11

Note

  • The subtraction works in a similar way, where elements on a given index gets subtracted and the result is the vector with differences at the element level
    • Further, in case of vectors of different lengths, similar recycling happens. You may like to play with this to become more comfortable.
  • Similarly, when you divide one vector by the another vector, the elements at the specific index in the dividend gets divided by the element at the same index in the divisor.
  • Exactly, same way it works in case of multiplication of two vectors. In nutshell, the arithmetic operations between two vectors are performed member-by-member.
  • You would like to note that every time you reassign value to a vector, the names associated with the vector gets reset. So, you may need to rename it again.

When you multiply (or divide) a vector by specific number then all the elements of the vector get multiplied (or divided) by that specific number.

Slicing a Vector

Many times you would need a slice of the vector data to be able to do some calculation.

Accessing a range of elements

Let’s consider the following vector:

student.wholeclass.maths <- c(9.5, 9.4, 9.1, 9.8, 7.7, 6.5, 6.4, 6.1, 7.8, 6.7) 
names(student.wholeclass.maths)<- student.names
student.wholeclass.maths

Using the slice of a vector, you need to use the colon (:) operator on the indexes. For example, if I need the grade of students between index 3 and 7 then following statement would enable me to get that slice:

r12

Accessing multiple ranges

You can specify multiple ranges using the combine (c) function and the comma separator. For example, following code allows you to select elements in the range 2-4 and 6-8:

r13

Applying Filters on a vector

By mentioning the conditional statement inside the square bracket ([]), you can apply a logical filter on a vector. For example, below code filters all the students whose maths grade is more than 9:

r14

Further, you can combine more conditions using the & (and), | (or) operators. For example, below statement shows the students whose grades are either more than 9 or less than 7:

r15

Logical Vector Index

When you applied filters on the vector, you passed the conditional statement inside the square brackets. Let’s take a look at these conditions by looking at how do they look like:

r16

In above statements, the conditional statements like student.wholeclass.maths>9,  student.wholeclass.maths<7 and (student.wholeclass.maths>9 | student.wholeclass.maths<7) are logical vectors. In fact, grade.filter is a vector created out of one of these logical operations.

So, essentially your filter is a logical vector and eventually the indexes where the value is TRUE, get returned as the output of the filter activity.

Run following command to see the data types of the grade.filter vector

> class(grade.filter)
[1] "logical"

Using Aggregate Functions

You can use following aggregate functions on vector to achieve the specific results:

  • mean: Arithmetic average of all the elements of the vector
  • max: Maximum value of all the element
  • min: Minimum value of all the element
  • sd: standard deviation, which is the square root of the variance
  • var: Variance, a numerical measure of how the data values is dispersed around the mean
  • median: is the value at the middle when the data is sorted in ascending order
  • range: is the difference of its largest and smallest data values

Following example shows its execution and corresponding results:

r17

I will cover the complete list of functions when I will delve into statistics using R. However, in the context of Vector, now you should be able to use aggregate and statistical functions.

 

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s