R_note/Rnote -- The Exploration of Statistic Software R. (統計軟體 R 深度歷險)

About R_note
Documents
Download
Installation
MS Windows

Batch jobs
Function
Data/MySQL
Ghostscript
Plot
String/Parse

Batch more
Classes

Fast Loop
LAM/MPI/Rmpi
Recursion

PHP Call R
Basic C
R Call Fortran/C
R Call GSL
C Call R API
C Call R Objects
Standalone

Build a Package
C Pointer

Old Pages
Not end
Others



Section: Fast Loop

This section shows you how to write a fast loop, and how efficient it is. Five examples will be demonstrated different ways o write the loop for the same purpose which is "sum a large matrix several times", as the following,
  1. "Sum by for 1" -- use "for()" loop to sum up the matrix by column.
  2. "Sum by for 2" -- use "for()" loop to sum up the matrix by row.
  3. "Sum by apply" -- use "apply()" function to sum up the matrix.
  4. "Sum by rowSums" -- use "rowSums()" internal function to sum up the matrix.
  5. "Sum by dyn" -- use "dynamical loading" to load external function to sum up the matrix.
Finally, the computing time of the above methods will be listed. The parallel version of these methods will be demonstrated and compared at the section of "LAM/MPI/Rmpi".


  • Sum by for 1
    First, create an R code file "loop_for_1.r" contains this

        
    # File name: loop_for_1.r
    
    my.loop <- 20
    m.dim <- list(nrow = 200000, ncol = 10)
    m <- matrix(1, nrow = m.dim$nrow, ncol = m.dim$ncol)
    ret <- 0
    
    start <- Sys.time()
    for(k in 1 : my.loop){
      for (i in 1 : m.dim$nrow){
        for (j in 1 : m.dim$ncol){
          ret <- ret + m[i, j]
        }
      }
    }
    Sys.time() - start
    

  • Sum by for 2
    First, create an R code file "loop_for_2.r" contains this

        
    # File name: loop_for_2.r
    
    my.loop <- 20
    m.dim <- list(nrow = 200000, ncol = 10)
    m <- matrix(1, nrow = m.dim$nrow, ncol = m.dim$ncol)
    ret <- 0
    
    start <- Sys.time()
    for(k in 1 : my.loop){
      for (j in 1 : m.dim$ncol){
        for (i in 1 : m.dim$nrow){
          ret <- ret + m[i, j]
        }
      }
    }
    Sys.time() - start
    

  • Sum by apply
    And then, create an R code file "loop_apply.r" contains this

        
    # File name: loop_apply.r
    
    my.loop <- 20
    m.dim <- list(nrow = 200000, ncol = 10)
    m <- matrix(1, nrow = m.dim$nrow, ncol = m.dim$ncol)
    ret <- 0
    
    start <- Sys.time()
    for(k in 1 : my.loop){
      ret <- ret + sum(apply(m, 1, sum))
    }
    Sys.time() - start
    

  • Sum by rowSums
    And then, create an R code file "loop_rowSums.r" contains this

        
    # File name: loop_rowSums.r
    
    my.loop <- 20
    m.dim <- list(nrow = 200000, ncol = 10)
    m <- matrix(1, nrow = m.dim$nrow, ncol = m.dim$ncol)
    ret <- 0
    
    start <- Sys.time()
    for(k in 1 : my.loop){
      ret <- ret + sum(rowSums(m))
    }
    Sys.time() - start
    

  • Sum by dyn
    Create a Fortran code file "loop_dyn.f" contains this

        
    c File name: lood_dyn.f
    c For dynamical load compile by g77.
    c SHELL> g77 -c loop_dyn.f ; g77 -shared -o loop_dyn.so loop_dyn.o
    
          subroutine dynsum(nrow, ncol, m, ret)
            integer i, j, nrow, ncol
            real*8 m(nrow, ncol), ret
    
            ret = 0
            do j = 1, ncol
              do i = 1, nrow
                ret = ret + m(i, j) 
              end do
            end do
    
            return
          end
    
    c Output is a shared library "loop_dyn.so" can called by R.
    

    And, create an R code file "loop_dyn.r" contains this

        
    # File name: loop_dyn.r
    
    dyn.load("loop_dyn.so")
    # For Windows will like this
    # dyn.load("C:/Windows/Desktop/loop_dyn.dll")
    
    my.loop <- 20
    m.dim <- list(nrow = 200000, ncol = 10)
    m <- matrix(1, nrow = m.dim$nrow, ncol = m.dim$ncol)
    ret <- 0
    
    dynsum.f <- function(m) {
      ret <- .Fortran("dynsum", nrow = nrow(m), ncol = ncol(m),
               m = as.double(m), ret = as.double(m))
      ret$ret
    }
    
    start <- Sys.time()
    for(k in 1 : my.loop){
      ret <- ret + dynsum.f(m)
    }
    Sys.time() - start
    
    dyn.unload("loop_dyn.so")
    # For Windows will like this
    # dyn.unload("C:/Windows/Desktop/loop_dyn.dll")
    

    For test, download the example "loop_dyn.dll" to "C:\Windows\Desktop\".

  • Computing time
    For PIII-1.4G PC, the test computing time as follows,

         Sum by
    Loop
    for 1 for 2 apply rowSums dyn
         Time
    (secs)
    331 307 117 2 19

  • Conclusion
    Use default internal function.
    Use external compiled function.
    Use "apply" to substitute "for loop".
    See "apply", "lapply", "tapply", "sapply".
    Use a column-wise data structure in R and Fortran.
    Use a row-wise data structure in C.



Top

Created: Oct 06 2003
Last Revised: Dec 23 2009, 10:30 (CST Ames, IA, USA)
Author: Wei-Chen Chen
E-Mail: snoweye@iastate.edu
© Copyright by Wei-Chen Chen
Best Resolution
IE6.0
1280x1024
small font