### Introduction

Excel is probably the most popular environment for data and statistical analysis but its functionality is quite limited when compared to R or other statistical software. In this post I would like to present a new way of accessing R functions in Excel which is easier and more productive than any other approach so far. I will start with showing a simple example of fitting a multiple linear model to a sample R data and then explain how it works.

### Linear model with R lm() function

If you have Excel 2007 or later and R, e.g. Microsoft R Open, you can try it yourself. Simply download this spreadsheet: RNet.xlsm, right click to Unblock it and follow steps below.

In this case scenario I will try to fit a linear model to R **mtcars** dataset. This dataset is of course available locally if you have R installed on your machine but I will access it from a remote location here.

My **RNet.xlsm** spreadsheet has a few User Defined Functions which start with **RNet**. One of them is **RNet.readTable** which maps to a similar function in R. I will use it to download the data into Excel, passing as argument the remote location of my dataset:

As you can see the UDF does not actually return the data directly. Instead, it creates a unique string identifier for R DataFrame object. So how is it useful? We can use this object handle as an argument in other UDFs. Let’s start with showing the data:

I have passed the data frame object handle to **RNet.asXlTable()** UDF. This UDF can return the whole data frame without using Excel array formula which is very convenient in the usual case when we do not know the size of the data upfront.

You might be wondering now what will happen if we point to a different dataset? Let’s try that:

I have changed the path from **mtcars** to **airquality** which is just another R sample dataset. As you can see a new object handle has been created and Excel has automatically cleared the previous data and updated it to show the new data frame. So in practice you can think of **RNet.asXlTable** as being an array function with automatic resizing.

Now, how do we add a linear model? In R we would use **lm()** function. In the spreadsheet there is a **RNet.lm()** UDF which maps to the R counterpart. It needs a string formula which defines the model and a data frame. Let’s revert to **mtcars** dataset and try it:

We get a new object handle, this time of type **SymbolicExpression**. This is how R objects are usually represented in .NET and returned to Excel. So how can we use this handle? First, we can peek inside:

I have used the **RNet.asXlTable()** UDF again. You can pass any **SymbolicExpression** object to it and it will try to show it in a table format. In this case our model is an R list and we can view all list members and their type. We can also drill down and show a specific list member, e.g. “coefficients”:

And so we get a whole R based model with just a few formulas. We can now use Excel recalculation engine to fit different models by just changing the formula in B4:

As you can see the model object handle has been recreated and a new set of coefficients has been automatically shown.

R has of course many more functions. We could expose all or some of them by creating a UDF for each but there is also another way of doing that. For any R function we can create a function object handle and then apply it to input arguments with special UDFs:

**RNet.eval()** is a special UDF which can parse and evaluate R function, including R function inline definition, and return it as an object handle to Excel. In this case I get an object which represents R **lm()** function. We can now apply it and recreate our linear model:

I have simply passed my formula and data frame as named arguments to **RNet.invokeNamed()** UDF. The first argument needs to be a function handle. Alternatively, I can pass the arguments by position:

I have used **RNet.invoke()** UDF which expects the arguments to be passed by position. The function handle is still passed as the first argument. We can now verify that the model has actually been created:

The UDFs in my RNet spreadsheet are embedded in the spreadsheet and will not be available once I close it. If you want to use them in other spreadsheets then you can simply activate it as Excel Customization:

### How does it work?

The spreadsheet is powered by **NeXL Connector** and **F# R Type Provider**. NeXL is available as a Nuget package and allows me to embed .NET functions as UDFs in Excel spreadsheets.It is free to use, including commercial use (CC 4.0 License), but requires permission to redistribute. With F# R Type Provider I can easily access R functionality from .NET.

All UDFs in RNet spreadsheet are defined in an F# project. The project is open source and you can find it on Github. You can clone and build it on your local machine.

I have used VS Code and F# Ionide plugin to create the project. Any R distribution should be fine, I have used Microsoft R Open.

The UDFs are defined in a module **RNet**. The module is marked with [<XlQualified(true)>] attribute so that their name in Excel always starts with **RNet** for easy discovery.

Most UDFs are very simple and call R functions directly, e.g.:

let lm(formula : string, dataFrame : DataFrame) = R.lm(formula = formula, data = dataFrame)

F# R Type Provider exposes R function **lm()** and I simply pass input arguments to it. It returns a **SymbolicExpression** object which is a .NET representation of the R model object. Note that the 2nd argument is a DataFrame so in Excel we first have to create an object handle of that type. This is of course done in **readTable()** function:

let readTable(file : string, separator : string option, headers : bool option, rowNamesCol : int option) = let separator = defaultArg separator "," let headers = defaultArg headers true let rowNamesCol = defaultArg rowNamesCol 1 let prms = namedParams ["file", box file; "sep", box separator; "header", box headers; "row.names", box rowNamesCol] let frame = R.read_table(prms) match frame with | DataFrame(df) -> df | _ -> raise (new InvalidOperationException("Not a data frame"))

Again, this function simply calls R **read_table()**. Note there are some optional arguments. If you do not specify those in Excel then None is automatically passed to the function. The R function is called with named parameters this time so I create them with a convenience function **namedParams**. It returns a DataFrame as expected.

There are a few similar functions which map directly to R, including the higher order functions for applying a function handle to arguments: **invoke()** and **invokeNamed()**.

The most interesting is the function **asXlTable()**. As you can see it expects a SymbolicExpression and optionally a list member. Using F# active patterns we can then process the object and convert it to **XlTable** object. **XlTable** represents a table of data which can be shown in an Excel range and is defined in NeXL. It is a special type which signals to NeXL runtime that the result is a table and should be sent to a range below and to the right of the calling cell. A comment is also automatically added to the calling cell, e.g. **XlTable 2 x 3**.

Finally, the project is built with Fake. The script is in **build.fsx**. The “Build” target will build the project so that we get a .NET assembly with all UDFs inside. Then “Embed” target will package NeXL Runtime, UDF assembly and RProvider assemblies into a macro enabled Excel spreadsheet. This spreadsheet is self contained and can be used on any machine with Excel 2007 or later, .NET and R.