Excel is probably the most popular environment for data and statistical analysis but its functionality is quite limited when compared to R or other statistical software. In this post I would like to present a new way of accessing R functions in Excel which is easier and more productive than any other approach so far. I will start with showing a simple example of fitting a multiple linear model to a sample R data and then explain how it works.
Linear model with R lm() function
If you have Excel 2007 or later and R, e.g. Microsoft R Open, you can try it yourself. Simply download this spreadsheet: RNet.xlsm, right click to Unblock it and follow steps below.
In this case scenario I will try to fit a linear model to R mtcars dataset. This dataset is of course available locally if you have R installed on your machine but I will access it from a remote location here.
My RNet.xlsm spreadsheet has a few User Defined Functions which start with RNet. One of them is RNet.readTable which maps to a similar function in R. I will use it to download the data into Excel, passing as argument the remote location of my dataset:
As you can see the UDF does not actually return the data directly. Instead, it creates a unique string identifier for R DataFrame object. So how is it useful? We can use this object handle as an argument in other UDFs. Let’s start with showing the data:
I have passed the data frame object handle to RNet.asXlTable() UDF. This UDF can return the whole data frame without using Excel array formula which is very convenient in the usual case when we do not know the size of the data upfront.
You might be wondering now what will happen if we point to a different dataset? Let’s try that:
I have changed the path from mtcars to airquality which is just another R sample dataset. As you can see a new object handle has been created and Excel has automatically cleared the previous data and updated it to show the new data frame. So in practice you can think of RNet.asXlTable as being an array function with automatic resizing.
Now, how do we add a linear model? In R we would use lm() function. In the spreadsheet there is a RNet.lm() UDF which maps to the R counterpart. It needs a string formula which defines the model and a data frame. Let’s revert to mtcars dataset and try it:
We get a new object handle, this time of type SymbolicExpression. This is how R objects are usually represented in .NET and returned to Excel. So how can we use this handle? First, we can peek inside:
I have used the RNet.asXlTable() UDF again. You can pass any SymbolicExpression object to it and it will try to show it in a table format. In this case our model is an R list and we can view all list members and their type. We can also drill down and show a specific list member, e.g. “coefficients”:
And so we get a whole R based model with just a few formulas. We can now use Excel recalculation engine to fit different models by just changing the formula in B4:
As you can see the model object handle has been recreated and a new set of coefficients has been automatically shown.
R has of course many more functions. We could expose all or some of them by creating a UDF for each but there is also another way of doing that. For any R function we can create a function object handle and then apply it to input arguments with special UDFs:
RNet.eval() is a special UDF which can parse and evaluate R function, including R function inline definition, and return it as an object handle to Excel. In this case I get an object which represents R lm() function. We can now apply it and recreate our linear model:
I have simply passed my formula and data frame as named arguments to RNet.invokeNamed() UDF. The first argument needs to be a function handle. Alternatively, I can pass the arguments by position:
I have used RNet.invoke() UDF which expects the arguments to be passed by position. The function handle is still passed as the first argument. We can now verify that the model has actually been created:
The UDFs in my RNet spreadsheet are embedded in the spreadsheet and will not be available once I close it. If you want to use them in other spreadsheets then you can simply activate it as Excel Customization:
How does it work?
The spreadsheet is powered by NeXL Connector and F# R Type Provider. NeXL is available as a Nuget package and allows me to embed .NET functions as UDFs in Excel spreadsheets.It is free to use, including commercial use (CC 4.0 License), but requires permission to redistribute. With F# R Type Provider I can easily access R functionality from .NET.
All UDFs in RNet spreadsheet are defined in an F# project. The project is open source and you can find it on Github. You can clone and build it on your local machine.
The UDFs are defined in a module RNet. The module is marked with [<XlQualified(true)>] attribute so that their name in Excel always starts with RNet for easy discovery.
Most UDFs are very simple and call R functions directly, e.g.:
let lm(formula : string, dataFrame : DataFrame) = R.lm(formula = formula, data = dataFrame)
F# R Type Provider exposes R function lm() and I simply pass input arguments to it. It returns a SymbolicExpression object which is a .NET representation of the R model object. Note that the 2nd argument is a DataFrame so in Excel we first have to create an object handle of that type. This is of course done in readTable() function:
let readTable(file : string, separator : string option, headers : bool option, rowNamesCol : int option) = let separator = defaultArg separator "," let headers = defaultArg headers true let rowNamesCol = defaultArg rowNamesCol 1 let prms = namedParams ["file", box file; "sep", box separator; "header", box headers; "row.names", box rowNamesCol] let frame = R.read_table(prms) match frame with | DataFrame(df) -> df | _ -> raise (new InvalidOperationException("Not a data frame"))
Again, this function simply calls R read_table(). Note there are some optional arguments. If you do not specify those in Excel then None is automatically passed to the function. The R function is called with named parameters this time so I create them with a convenience function namedParams. It returns a DataFrame as expected.
There are a few similar functions which map directly to R, including the higher order functions for applying a function handle to arguments: invoke() and invokeNamed().
The most interesting is the function asXlTable(). As you can see it expects a SymbolicExpression and optionally a list member. Using F# active patterns we can then process the object and convert it to XlTable object. XlTable represents a table of data which can be shown in an Excel range and is defined in NeXL. It is a special type which signals to NeXL runtime that the result is a table and should be sent to a range below and to the right of the calling cell. A comment is also automatically added to the calling cell, e.g. XlTable 2 x 3.
Finally, the project is built with Fake. The script is in build.fsx. The “Build” target will build the project so that we get a .NET assembly with all UDFs inside. Then “Embed” target will package NeXL Runtime, UDF assembly and RProvider assemblies into a macro enabled Excel spreadsheet. This spreadsheet is self contained and can be used on any machine with Excel 2007 or later, .NET and R.