# R Interview Questions and Answers

# R Interview Questions and Answers

# 1. Explain the data import in R language.

A. R provides to import data in R language. To begin with the R commander GUI, user should type the commands in the command Rcmdr into the console. Data can be imported in R language in 3 ways such as:

Select the data set in the dialog box or enter the name of the data set as required.

Data is entered directly using the editor of R Commander via Data->New Data Set. This works good only when the data set is not too large.

Data can also be imported from a URL or from plain text file (ASCII), or from any statistical package or from the clipboard.

Go through the R Programming Video to get clear understanding of R.

# Q.2. Explain how to communicate the outputs of data analysis using R language.

A. Combine the data, code and analysis results in a single document using knitr for Reproducible research done. Helps to verify the findings, add to them and engage in conversations. Reproducible research makes it easy to redo the experiments by inserting new data values and applying it to different various problems.

# Q.3.What is R?

A. R is a programming language which is used for developing statistical software and data analysis. It is being increasingly deployed for machine learning applications as well.

# Q.4.What are the disadvantages of R Programming?

A. The disadvantages are:-

Lack of standard GUI

Not good for big data.

Does not provide spreadsheet view of data.

# 5. What is the use of With () and By () function in R?

A. with() function applies an expression to a dataset.

#with(data,expression)

By() function applies a function t each level of a factors.

#by(data,factorlist,function)

# Q.6. What is the use of subset() and sample() function in R?

A. Subset() is used to select the variables and observations and sample() function is used to generate a random sample of the size n from a dataset.

# Q.7. What are the advantages of R?

A. The advantages are:-

It is used for managing and manipulating of data.

No license restrictions

Free and open source software.

Graphical capabilities of R are good.

Runs on many Operating system and different hardware and also run on 32 & 64 bit processors etc.

# Q.8. What is the function used for adding datasets in R?

A. For adding two datasets rbind() function is used but the column of two datasets must be same.

Syntax: rbind(x1,x2……) where x1,x2: vector, matrix, data frames.

# Q.9. How many data structures R has?

A. There are 5 data structure in R i.e. vector, matrix, array which are of homogenous type and other two are list and data frame which are heterogeneous.

# Q.10.How many sorting algorithms are available?

A. There are 5 types of sorting algorithms are used which are:-

Bubble Sort

Selection Sort

Merge Sort

Quick Sort

Bucket Sort

# 11.How to create new variable in R programming?

A. For creating new variable assignment operator ‘<-’ is used

For e.g. mydata$sum <- mydata$x1 + mydata$x2

# Q.12.What are R packages?

A. Packages are the collections of data, R functions and compiled code in a well-defined format and these packages are stored in library. One of the strengths of R is the user-written function in R language.

# Q.13. What is the function which is used for merging of data frames horizontally in R?

A. Merge()function is used to merge two data frames

Eg. Sum<-merge(data frame1,data frame 2,by=’ID’)

# Q.14. What is fitdistr() function?

A. It is used to provide the maximum likelihood fitting of univariate distributions. It is defined under the MASS package.

# Q.15. What is FactoMineR?

A. It is a package which includes quantitative and qualitative variables. It also includes supplementary variables and observations.

# 16. Why does widget not show up in the sidebar?

A. While using a widget, you have to ensure whether your theme supports the widget and if it does then, it must show the sidebar. If in any case if it happens that you don’t see the sidebar, then it might be missing the “function.php” file or file similar to that. This can also happen if you have forgotten to save the changes in the widget or to refresh the older display of the page.

# Q.17. How can you load a .csv file in R?

A. Loading a .csv file in R is quite easy. All you need to do is use the “read.csv()” function and specify the path of the file.

house<-read.csv(“C:/Users/John/Desktop/house.csv”)

# Q.18. What are the different components of grammar of graphics?

A. Broadly speaking these are different components in grammar of graphics:

Data layer

Aesthetics layer

Geometry layer

Facet layer

Co-ordinate layer

Themes layer

# Q.19. What are the different components of grammar of graphics?

A. Broadly speaking these are different components in grammar of graphics:

Data layer

Aesthetics layer

Geometry layer

Facet layer

Co-ordinate layer

Themes layer

# Q.20. How do you install a package in R?

A. The below command is used to install a package in R:

install.packages(“”)

let’s look at an example:

# 21. What is Rmarkdown? What is the use of it?

A. RMarkdown is a reporting tool provided by R. With the help of Rmarkdown, you can create high quality reports of your R code.The output format of Rmarkdown can be:

HTML

PDF

WORD

# Q.22. Name some packages in R, which can be used for data imputation?

A. These are some packages in R which can used for data imputation

MICE

Amelia

missForest

Hmisc

Mi

imputeR

# Q.23. Name some functions available in “dplyr” package.

A. Functions in dplyr package:

filter

select

mutate

arrange

count

# Q.24. Tell me something about shinyR.

A. Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in Rmarkdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions.

# Q.25. What is advantage of using apply family of functions in R?

A. The apply function allows us to make entry-by-entry changes to data frames and matrices.The usage in R is as follows:

apply(X, MARGIN, FUN, …)

where:

X is an array or matrix;

MARGIN is a variable that determines whether the function is applied over rows (MARGIN=1), columns (MARGIN=2), or both (MARGIN=c(1,2));

FUN is the function to be applied.

If MARGIN=1, the function accepts each row of X as a vector argument, and returns a vector of the results. Similarly, if MARGIN=2 the function acts on the columns of X. Most impressively, when MARGIN=c(1,2) the function is applied to every entry of X.

Advantage:

With the apply function we can edit every entry of a data frame with a single line command. No auto-filling, no wasted CPU cycles.

# 26. What packages are used for data mining in R?

A. Some packages used for data mining in R:

data.table- provides fast reading of large files

rpart and caret- for machine learning models.

Arules- for associaltion rule learning.

GGplot- provides varios data visualization plots.

tm- to perform text mining.

Forecast- provides functions for time series analysis

# Q.27. What do you know about the rattle package in R?

A. Rattle is a popular GUI for data mining using R. It presents statistical and visual summaries of data, transforms data so that it can be readily modelled, builds both unsupervised and supervised machine learning models from the data, presents the performance of models graphically, and scores new datasets for deployment into production. A key features is that all of your interactions through the graphical user interface are captured as an R script that can be readily executed in R independently of the Rattle interface.

# Q.28. How would you fit a linear model over a scatter-plot?

A. We can do that using the “ggplot2” package.We’ll first make a scatter-plot with the help of geom_point() function, then we’ll make the linear model, by adding the geom_smooth() layer on top of it.

data, presents the performance of models graphically, and scores new datasets for deployment into production. A key features is that all of your interactions through the graphical user interface are captured as an R script that can be readily executed in R independently of the Rattle interface.

# Q.29. What are the different import functions in R?

A. Data from different sources and different formats can be imported into R. Let’ have a look at the different import functions available in R:

read.csv()-> for reading .csv files

read_sas()-> for reading .sas7bdat files

read_excel()-> for xl sheets

read_sav()-> for spss data

# Q.30. Name some functions which can be used for debugging in R?

A. These are some functions which can be used for debugging in R:

traceback()

debug()

browser()

trace()

recover()

# 31. What are the objects you use most frequently?

A. This question is meant to gather a sense of your experiences in R. Simply think about some recent work you’ve done in R and explain the data objects you use most often. If you use arrays frequently, explain why and how you’ve used them.

# Q.32. What are some of your favorite functions in R?

A. As a user of R, you should be able to come up with some functions on the spot and describe them. Functions that save time and, as a result, money will always be something an interviewer likes to hear about.

# Q.33. What are 3 sorting algorithms available in R?

A. R uses the sort() function to order a vector or factor, listed and described below.Radix: Usually the most performant algorithm, this is a non-comparative sorting algorithm that avoids overhead. It’s stable, and it’s the default algorithm for integer vectors and factors.Quick Sort: This method “uses Singleton (1969)’s implementation of Hoare’s Quicksort method and is only available when x is numeric (double or integer) and partial is NULL,” according to R Documentation. It’s not considered a stable sort.

Shell: This method “uses Shellsort (an O(n4/3) variant from Sedgewick (1986)),” according to R Documentation.

# Q.34. Why is R useful for data science?

A. R turns otherwise hours of graphically intensive jobs into minutes and keystrokes. In reality, you probably wouldn’t encounter the language of R outside the realm of data science or an adjacent field. It’s great for linear modeling, nonlinear modeling, time-series analysis, plotting, clustering, and so much more.Simply put, R is designed for data manipulation and visualization, so it’s natural that it would be used for data science.

# Q.35. What is reshaping of data in R?

A. In R the data objects can be converted from one form to another. For example we can create a data frame by merging many lists. This involves a series of R commands to bring the data into the new format. This is called data reshaping.

# 1. Explain the data import in R language.

**Ans**. R provides to import data in R language. To begin with the R commander GUI, user should type the commands in the command Rcmdr into the console. Data can be imported in R language in 3 ways such as:

Select the data set in the dialog box or enter the name of the data set as required.

Data is entered directly using the editor of R Commander via Data->New Data Set. This works good only when the data set is not too large.

Data can also be imported from a URL or from plain text file (ASCII), or from any statistical package or from the clipboard.

Go through the R Programming Video to get clear understanding of R.

# Q.2. Explain how to communicate the outputs of data analysis using R language.

**Ans**. Combine the data, code and analysis results in a single document using knitr for Reproducible research done. Helps to verify the findings, add to them and engage in conversations. Reproducible research makes it easy to redo the experiments by inserting new data values and applying it to different various problems.

# Q.3.What is R?

**Ans**. R is a programming language which is used for developing statistical software and data analysis. It is being increasingly deployed for machine learning applications as well.

# Q.4.What are the disadvantages of R Programming?

**Ans.** The disadvantages are:-

Lack of standard GUI

Not good for big data.

Does not provide spreadsheet view of data.

# 5. What is the use of With () and By () function in R?

**Ans**. with() function applies an expression to a dataset.

#with(data,expression)

By() function applies a function t each level of a factors.

#by(data,factorlist,function)

# Q.6. What is the use of subset() and sample() function in R?

**Ans**. Subset() is used to select the variables and observations and sample() function is used to generate a random sample of the size n from a dataset.

# Q.7. What are the advantages of R?

**Ans**. The advantages are:-

It is used for managing and manipulating of data.

No license restrictions

Free and open source software.

Graphical capabilities of R are good.

Runs on many Operating system and different hardware and also run on 32 & 64 bit processors etc.

# Q.8. What is the function used for adding datasets in R?

**Ans**. For adding two datasets rbind() function is used but the column of two datasets must be same.

Syntax: rbind(x1,x2……) where x1,x2: vector, matrix, data frames.

# Q.9. How many data structures R has?

**Ans**. There are 5 data structure in R i.e. vector, matrix, array which are of homogenous type and other two are list and data frame which are heterogeneous.

# Q.10.How many sorting algorithms are available?

**Ans**. There are 5 types of sorting algorithms are used which are:-

Bubble Sort

Selection Sort

Merge Sort

Quick Sort

Bucket Sort

# 11.How to create new variable in R programming?

**Ans.** For creating new variable assignment operator ‘<-’ is used

For e.g. mydata$sum <- mydata$x1 + mydata$x2

# Q.12.What are R packages?

**Ans**. Packages are the collections of data, R functions and compiled code in a well-defined format and these packages are stored in library. One of the strengths of R is the user-written function in R language.

# Q.13. What is the function which is used for merging of data frames horizontally in R?

**Ans**. Merge()function is used to merge two data frames

Eg. Sum<-merge(data frame1,data frame 2,by=’ID’)

# Q.14. What is fitdistr() function?

**Ans**. It is used to provide the maximum likelihood fitting of univariate distributions. It is defined under the MASS package.

# Q.15. What is FactoMineR?

**Ans**. It is a package which includes quantitative and qualitative variables. It also includes supplementary variables and observations.

# 16. Why does widget not show up in the sidebar?

**Ans**. While using a widget, you have to ensure whether your theme supports the widget and if it does then, it must show the sidebar. If in any case if it happens that you don’t see the sidebar, then it might be missing the “function.php” file or file similar to that. This can also happen if you have forgotten to save the changes in the widget or to refresh the older display of the page.

# Q.17. How can you load a .csv file in R?

**Ans**. Loading a .csv file in R is quite easy. All you need to do is use the “read.csv()” function and specify the path of the file.

house<-read.csv(“C:/Users/John/Desktop/house.csv”)

# Q.18. What are the different components of grammar of graphics?

**Ans**. Broadly speaking these are different components in grammar of graphics:

Data layer

Aesthetics layer

Geometry layer

Facet layer

Co-ordinate layer

Themes layer

# Q.19. What are the different components of grammar of graphics?

**Ans**. Broadly speaking these are different components in grammar of graphics:

Data layer

Aesthetics layer

Geometry layer

Facet layer

Co-ordinate layer

Themes layer

# Q.20. How do you install a package in R?

**Ans**. The below command is used to install a package in R:

install.packages(“”)

let’s look at an example:

# 21. What is Rmarkdown? What is the use of it?

**Ans**. RMarkdown is a reporting tool provided by R. With the help of Rmarkdown, you can create high quality reports of your R code.The output format of Rmarkdown can be:

HTML

PDF

WORD

# Q.22. Name some packages in R, which can be used for data imputation?

**Ans**. These are some packages in R which can used for data imputation

MICE

Amelia

missForest

Hmisc

Mi

imputeR

# Q.23. Name some functions available in “dplyr” package.

**Ans**. Functions in dplyr package:

filter

select

mutate

arrange

count

# Q.24. Tell me something about shinyR.

**Ans**. Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in Rmarkdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions.

# Q.25. What is advantage of using apply family of functions in R?

**Ans**. The apply function allows us to make entry-by-entry changes to data frames and matrices.The usage in R is as follows:

apply(X, MARGIN, FUN, …)

where:

X is an array or matrix;

MARGIN is a variable that determines whether the function is applied over rows (MARGIN=1), columns (MARGIN=2), or both (MARGIN=c(1,2));

FUN is the function to be applied.

If MARGIN=1, the function accepts each row of X as a vector argument, and returns a vector of the results. Similarly, if MARGIN=2 the function acts on the columns of X. Most impressively, when MARGIN=c(1,2) the function is applied to every entry of X.

Advantage:

With the apply function we can edit every entry of a data frame with a single line command. No auto-filling, no wasted CPU cycles.

# 26. What packages are used for data mining in R?

**Ans**. Some packages used for data mining in R:

data.table- provides fast reading of large files

rpart and caret- for machine learning models.

Arules- for associaltion rule learning.

GGplot- provides varios data visualization plots.

tm- to perform text mining.

Forecast- provides functions for time series analysis

# Q.27. What do you know about the rattle package in R?

**Ans**. Rattle is a popular GUI for data mining using R. It presents statistical and visual summaries of data, transforms data so that it can be readily modelled, builds both unsupervised and supervised machine learning models from the data, presents the performance of models graphically, and scores new datasets for deployment into production. A key features is that all of your interactions through the graphical user interface are captured as an R script that can be readily executed in R independently of the Rattle interface.

# Q.28. How would you fit a linear model over a scatter-plot?

**Ans**. We can do that using the “ggplot2” package.We’ll first make a scatter-plot with the help of geom_point() function, then we’ll make the linear model, by adding the geom_smooth() layer on top of it.

data, presents the performance of models graphically, and scores new datasets for deployment into production. A key features is that all of your interactions through the graphical user interface are captured as an R script that can be readily executed in R independently of the Rattle interface.

# Q.29. What are the different import functions in R?

**Ans**. Data from different sources and different formats can be imported into R. Let’ have a look at the different import functions available in R:

read.csv()-> for reading .csv files

read_sas()-> for reading .sas7bdat files

read_excel()-> for xl sheets

read_sav()-> for spss data

# Q.30. Name some functions which can be used for debugging in R?

**Ans**. These are some functions which can be used for debugging in R:

traceback()

debug()

browser()

trace()

recover()

# 31. What are the objects you use most frequently?

**Ans**. This question is meant to gather a sense of your experiences in R. Simply think about some recent work you’ve done in R and explain the data objects you use most often. If you use arrays frequently, explain why and how you’ve used them.

# Q.32. What are some of your favorite functions in R?

**Ans**. As a user of R, you should be able to come up with some functions on the spot and describe them. Functions that save time and, as a result, money will always be something an interviewer likes to hear about.

# Q.33. What are 3 sorting algorithms available in R?

**Ans.** R uses the sort() function to order a vector or factor, listed and described below.Radix: Usually the most performant algorithm, this is a non-comparative sorting algorithm that avoids overhead. It’s stable, and it’s the default algorithm for integer vectors and factors.Quick Sort: This method “uses Singleton (1969)’s implementation of Hoare’s Quicksort method and is only available when x is numeric (double or integer) and partial is NULL,” according to R Documentation. It’s not considered a stable sort.

Shell: This method “uses Shellsort (an O(n4/3) variant from Sedgewick (1986)),” according to R Documentation.

# Q.34. Why is R useful for data science?

**Ans**. R turns otherwise hours of graphically intensive jobs into minutes and keystrokes. In reality, you probably wouldn’t encounter the language of R outside the realm of data science or an adjacent field. It’s great for linear modeling, nonlinear modeling, time-series analysis, plotting, clustering, and so much more.Simply put, R is designed for data manipulation and visualization, so it’s natural that it would be used for data science.

# Q.35. What is reshaping of data in R?

**Ans**. In R the data objects can be converted from one form to another. For example we can create a data frame by merging many lists. This involves a series of R commands to bring the data into the new format. This is called data reshaping.