Like Vectors, matrix is a data structure. It is used for storing data in two-dimensional tables with rows and columns of data. Also, like vectors, matrix in R contains a single type of data. Since, often matrix is used for numerical calculations, its elements are often numerical data.
R-provides a method called matrix, which allows you to create a matrix.
Following is the syntax of matrix:
matrix( data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
- data is the data set from which you intend to create a matrix. Often it is a one dimensional vector.
- nrow is the number of rows desired in the matrix
- ncol is the number of columns desired in the matrix
- byrow indicates whether the data will be distributed across the row first or the column first. The default value of FALSE means that the data will be distributed column wise
- dimnames attributes allows you to mention the names for the rows and columns.
- An empty list is treated as NULL
- NULL or a list of length 2 gives row and column names respectively
- A list of length 1 is treated as row name
Matrix with default options
Let’s consider following vector:
> student.grades <- c(7, 8, 9, 8, 6, 7, 5.8, 7.6, 6.9, 4.5, 5.4, 8.5, 9.7, 9.9, 8.4, 7.5, 8.5, 9.5, 6.5, 7.5 )
When you create a matrix using the default syntax, as shown below:
grades.mat <- matrix(student.grades)
It produces a matrix of dimension 20×1, which indicates that by default 1-column is used and depending on the length of the data vector, the number of rows gets decided:
> grades.mat [,1] [1,] 7.0 [2,] 8.0 [3,] 9.0 [4,] 8.0 [5,] 6.0 [6,] 7.0 [7,] 5.8 [8,] 7.6 [9,] 6.9 [10,] 4.5 [11,] 5.4 [12,] 8.5 [13,] 9.7 [14,] 9.9 [15,] 8.4 [16,] 7.5 [17,] 8.5 [18,] 9.5 [19,] 6.5 [20,] 7.5
Matrix of the desired dimension
Let’s assume that we have 5-students and four subjects. By specifying the nrow and ncol values you can create a 4 x 5 matrix, which will enable you to depict this data in the tabular format.
In above matrix, you didn’t provide the dimension names parameter. Thus, it used the null value and you received numeric dimension names in the notation [m, n], where m indicates the row and n indicates the column.
Dimension names for the matrix
You can make use of dimnames function to assign the column and row names for the matrix.
You do need to make sure that the rownames and colnames dimension contains the correct number of elements.
Further, you need to note that the first element of the dimnames is the name of rows and the second element is the column names. If you just provide one element then that means you want to provide row names. If the dimension value and the elements in the dimension names are not same then it gives error.
Following example shows few variants, with just row names, with just column names and with wrong number of row names:
Accessing Matrix Elements
You can access a specific element using the numeric indexes. For example grades.mat[3,4] gives the element at the 3rd row and the 4th column:
Alternately, you can also access a specific element using the dimension names, as shown in below example:
Accessing a row
By keeping the column index empty, you can select a given row. For example by using grades.mat[3,], you can select the grades in English for all the students:
Accessing a Column
By keeping the row index empty, you can access the complete column of a matrix. For example, using grades.mat[,4], you can get all the elements of 4th column (in this case all the grades of Pratyush):
Accessing a submatrix
Using the index ranges, you can select part of the matrix. For example, using grades.mat[2:4,1:3], you can select a submatrix, which will show a subset of grades of a subset of students.
Accessing Any Row or Columns
While accessing through the : (colon) operator allows you to access congruent submatrix, using the c (combine) function, you can choose specific row and / or columns.
Following example shows how to select specific rows and columns, specific columns and specific rows:
Transpose of a matrix
Transpose matrix is a matrix when the original matrix’s row get converted into columns and the columns get converted into rows. Using the t (transpose) method, you can transpose a matrix.
Multiplying by a scalar
You can multiply all the elements of a matrix by multiplying the matrix with a scalar. Following example shows, how you can multiply grades by 0.9:
Similarly, you can apply division or power or even reciprocal to effect divide/power/invert each element of the matrix.
Adding the matrix with self
As you would expect, adding the matrix by self results into each element becoming double in its value.
Similarly, subtracting the same matrix will result into a matrix with all zeroes. And, dividing the matrix by the same matrix will result into matrix with all ones. And, multiplying the matrix by the same matrix will result into each element becoming square of the original element.
True Matrix Multiplication
In the previous section, we talked about multiplication of a matrix by the same matrix results into a matrix with its element becoming square of the original element value. In true algebra, it doesn’t work this way.
Hence instead of a syntax like grades.mat * grades.mat, you need to use grades.mat %*% grades.mat.
When you do that then output would look as shown below:
> grades.mat %*% grades.mat Error in grades.mat %*% grades.mat : non-conformable arguments
True matrix operations does expect you to ensure the required conformance. For example, if you are multiplying the two matrices the the number of columns in first matrix and the number of rows in the second matrix must be same. Further, if you are trying to multiply the matrix with itself, then the matrix must be a square matrix.
In following example, the 4×4 sub-matrix has been used to multiply the two matrices:
You may like to note that the elements are proper matrix multiplications. They are not simply squares of the original element values.
Combining Two Matrix
There are situations where you may need to add more columns and / or rows into an existing matrix. Often you do this by adding a matrix with additional rows / columns into an existing matrix.
The matrix can be combined by rows or by columns. Below example shows two different scenarios.
Combining matrix to add more rows
Using rbind function, you can combine two matrix with same number of columns to add more rows into the original matrix.
In below example, a new matrix has been created and using rbind, it has been appended into an existing matrix to add additional subjects for the same set of students:
Combining matrix to add additional columns
Using cbind, you can combine two matrix with the same number of rows to add more columns.
In the example in above section, suppose you need to add few more students. Then you can create a matrix with all the subjects being shown in the row and desired number of students in the additional columns. Below example demonstrate this scenario:
Applying Aggregate Functions
Many times you would need to apply average or summary on the matrix to make sense of the data.
Using colSums you can calculate the total of a given column. When you apply this on a matrix, it returns a vector with totals, on which you can apply further arithmetics to achieve the desired result.
Following example shows the usage of colSums to calculate average grade of each students by using the colSums method and the result being divided by the number of rows in the matrix.
Similar to colSums, you can make use of rowSums to calculate totals for a given row. In below example, the rowSums has been used to calculate average grades in every subject.
Using ncols and nrows
In the previous section, when we talked about colSums and rowSums, for the sake of ease, we counted the records manually and used the same for calculating the average.
However, you can make use of the ncol and nrow functions on the matrix to get the number of columns and number of rows respectively.
Following example shows a sample usage of ncol function
Using rowMeans and colMeans
In earlier examples, we made use of rowSums and then divided this by the number of rows to calculate the average for a given column. R-provides inbuilt functions like rowMeans and colMeans to achieve exactly same result.
Below example shows how to calculate means at the row level and at the column level:
Of course, you can make use of cbind and rbind to append these rows and columns in the original matrix.
Matrix Creation – Revisited
While learning the concepts or just generating some sample data, you may need to know ways to generate the matrix quickly. In this section, I will explain few simpler ways to create desired matrixes without typing the data manually.
Using range (colon) operator
Following matrix function generated 5×6 matrix with number between 1 to 30:
Of course, using byrow parameter you can decide whether to first distribute on rows or on columns. The default is columns.
Following example shows that you can multiply the two data sets and it will multiply elements at the respective position to give you the square of the digits between 1 to 30:
Giving dimension names
Earlier in this article, you gave names to dimensions using the function dimnames. However, you can very well use the colnames and rownames functions to make things more explicit. Following example shows the usage of rownames and colnames:
Matrix Sample Data Creation
The data in above section was more predictable. In some cases, you may like to generate random data to create more real time situations. Even for learning purpose, it is good to know various ways to create data for matrix.
Using Sample Function
Using sample function, you can generate sample data vector, which you can in turn use for creating a matrix.
Following is the syntax of the sample function:
sample( x, size, replace = FALSE, prob = NULL)
- x can be a specific number or a range of number. When x is specific number then a number between 1 to x gets used for creating samples.
- Size is the number of elements that we want to generate
- Replace indicates if the elements can be replaced or not. If this is FALSE then the element values are unique. Thus the mentioned range of x must be larger than the desired size of the sample
- Prob allows you to define the probability (weight) of different numbers in the sample.
- If you don’t have any specific need for mentioning weight then leave this to NULL so that the more even distribution can be created
- It is NOT necessary that total of prob values will be equal to 1. It just indicates the weight of a given value.
Below example creates a matrix 6×10 with the number in the range of 3:10, where numbers are replaceable (i.e. repeatable) and probability of numbers between 6 to 9 is on higher side:
For above sample, if you want to visualize the distribution pattern then you can make use of factor function with the summary function. For example, below code shows how many times the different numbers occurred in above sample:
Sometimes it will make sense to use letters or month names to give names/dimension to the rows or columns. You can make use of inbuilt constants like letters or month.name for the same.
While I have tried to cover as many examples as feasible, the matrix is an important topic and you do need to be aware of additional resources to get required help when needed.
Following resources will help you understand more on matrix and its usage
- Manuals on the link https://cran.r-project.org/