Async Task with Progress Report and Cancellation

When building a WinForms application, I encountered a problem where I try to run a long calculation. Because the process can take very long, I want to popup a new form which has a progress bar and a cancel button.

Both the progress report and the cancellation are frequent questions on stackoverflow, but all the answers only provide code snippets without showing the whole picture. By putting several pieces together, I created a complete example to demonstrate a simple solution. It utilizes IReport for progress report and CancellationToken for cancellation.

The example WinForms application has a main form which has three buttons to start the Fibonacci calculation with different setup. When you click one of the buttons, the async calculation will start and a new form will pop up to show progress or let the user cancel the calculation.

If the third button is clicked, both the progress report and the cancellation button will be seen on the new window. Below is the screenshot when the third button is clicked.

The source code can be accessed on GitHub.

Better Organization of RMarkdown Outputs

By default, knit renders RMarkdown outputs to the same folder of the .Rmd file. The RMarkdown Cookbook demonstrates that the behavior of the Knit button can be changed by defining the custom knit function in the YAML frontmatter.

Based on the technique, this post will show you how to improve the organization of the RMarkdown outputs in your project.

Suppose your project is organized with the following folder structure:

Because the R Notebook / RMarkdown outputs are saved in the same folder as the source code files by default, it makes the R code folder cluttered with too many files. Also, having all the outputs under the same name is not good for versioning. Sometimes we want the outputs to be dated. So ideally, we want the files to be organized like this:

To achieve this, we need to do three things to the RMarkdown output:

  • Redirect all the outputs to the “R Notebook” folder.
  • Replicate the subfolders from the “R Code” folder to the “R Notebook” folder.
  • Append date to the end of the filename.

We can get all these done in a few lines:

To use this custom function in knit, just put it in “Include.R” and then call it from the YAML frontmatter:

Export R data to SAS

The Problem

Import data from SAS to R is easy. Library haven has the read_sas() function to import *.sas7bdat files to R data.frame. However, the *.sas7bdat files create by the write_sas() function mostly won’t be able to be read by SAS.

It seems the alternative way of transferring data from R to SAS should be easy: just find a common data format supported by both R and SAS. For example, R can export data to CSV, and SAS can import CSV. Wouldn’t that be easy?

Actually, no. Accurately importing CSV in SAS is never an easy task. SAS has to guess the type of data for each column, and weird data cells (quotes, commas, special characters, etc.) screw up the importing process all the time. How about other data formats?

Excel? It could be even worse than CSV.

How about SQL server (MySQL, MS SQL, etc.) then? Do you really want to setup a SQL server just for this?

If SAS supports the CSV format with YAML frontmatter, that will certainly help a lot, but it doesn’t.

The Solution

So, the only sane solution is to run R code in SAS. Let the R code read the R data, and then use the ImportDataSetFromR() function to transfer the R data.frame to a SAS dataset. The SAS code would look like this:

If you use the CSVY format in R, the 3rd line would be something like this:

This is a fairly simple solution, and unlike haven::write_sas(), it has a 100% success rate.

Yes, this solution requires the SAS software on the machine. But I guess most people need to use SAS data should already have SAS installed on their machines.

The R Workflow Solution

One thing I don’t like the above solution is that it needs to run a SAS script, which disrupts my workflow.

When I need to export R data to SAS, that’s because I am working on a R project, and at the end of the process, I need to export my R data products to share with colleagues who only use SAS.

I could open SAS and run the above SAS script. But I don’t like that. Since I am already using R, why don’t I just run everything in R? It is certainly doable because we can run SAS script in R.

What I will need first is a SAS script, which will have the code from the above solution and will be named as “ImportFromR.sas”. And then, I will call this script from R use the following code:

Two notes:

  • “-RLANG” is added to the SAS command line call because I don’t use it in the SAS config file.
  • “R_HOME” need to be assigned in either the SAS config file, or the SAS autoexec file, or the “ImportFromR.sas” file. See Setup SAS to Run R Code inside SAS.

In addition, to make the entire process adaptable, you can write a R function which creates the SAS script “ImportFromR.sas” (so that it contains the real path of the R data), and then call this SAS script. Following pseudo code demonstrates the general idea:

Use CSVY Format for Data Storage

I have been using fread() and fwrite() from the data.table package for years. However, I recently noticed a change in 1.11.0 broke my code. This is not a recent change, I must haven’t updated my R packages for a while. Or, I haven’t used the related code for a while. Or, I just haven’t noticed the change. The change is (quote):

Numeric data that has been quoted is now detected and read as numeric.

Quoted numbers used to be read as characters because they are quoted. Now, for whatever reason, data.table has decided to read quoted numbers as numbers, even when they are quoted.

The old code still runs, I just get the data not as expected. I have data using “0001, 0002, 0003, …” in the id column. Now, the id column is read as “1, 2, 3, …” This change does not generate error message immediately, which will happen ten steps later down the line, where I need to do character operations on the id column.

First, I was angry for data.table making this change. After taking a deep breath, I agree the root of the problem is not data.table. Rather, it is lack of meta information in the CSV format. To fundamentally solve this problem, meaning to let R programs to be able to always read CSV files as expected, the solution is embedding column definitions inside CSV files. This solution, which I didn’t know before, actually exists.

CSVY format adds YAML frontmatter to CSV files. Besides other descriptive information, the YAML frontmatter includes column definitions like this:

With this information saved inside the CSV file, it ensures the CSV file can be read as expected.

Using the CSVY format with R data.table is straightforward. Both fread() and fwrite() has the boolean parameter yaml since 1.12.4, so we can just use fwrite(..., yaml = TRUE) to save the CSVY file, and then use fread(..., yaml = TRUE) to load the CSVY file. This feature provides a long-term solution for giving column definitions to CSV files.

Backup & Sync R Libraries

In the past, every time when I setup a new machine, I had to reinstall the R libraries. So, like many others, I exported the list of the R libraries I use from the old machine, and then install those libraries on the new machine.

What is more annoying is that because I use R on multiple machines, from time to time, I found that after installed and used a new package on machine A, code would be broken on machine B, and then I realized that I need to install the new package on machine B.

I recently found that these annoying processes can be easily solved with cloud service. 

R packages, by default, are saved in your “Documents” folder, under Documents\R\win-library\{R-version}. And some cloud drive apps like OneDrive, Dropbox, and many others, can synchronize your “Documents” folder to the cloud, and sync between multiple machines. 

So, all you need to do, is simply letting your cloud drive synchronize your “Documents” folder. Take OneDrive as the example, go to settings, click “Backup”, and then click “Manage backup”, and then chose the “Documents” folder.

Unless you changed the default R library folder, your R libraries should be backed up to the cloud and synchronizing among your computers now. 

If your different machines have different R versions installed, this shouldn’t be a problem, because R library folder is organized by different R versions.

Run SAS Script in R

Unlike SAS, which provides designated syntax to run a R session alongside the SAS session, R does not have the capability to run a SAS session.

However, R has the function system() to invoke any command from the OS. On the other hand, SAS provides a functionality to allow us to run a SAS script through command line. By putting those two together, we can use R to call the SAS command to run a SAS script.

The command line syntax to call SAS to run a SAS script is:

All we need to do is call the above command from the R system() function. The R code will look like:

Notice two small changes:

  1. The whole command is put within the single quotes because we use double quotes in the command, which is because the paths could have spaces in it.
  2. Backslashes \ are escaped with additional backslashes for the R syntax.

The above code is hard to modify and expand. We can make things easier by splitting the pieces and paste them back together.

Using the above syntax, we can add more to the SAS command:

Note: the SAS installation path can be different, please check if the above code does not run on your machine.

Run R Code in SAS

After setting up SAS to run R code, now we should be able to run R code inside SAS. It is achieved by running submit / R inside proc iml:

However, the SAS session and the R session do not automatically share data between each other.

The SAS function to send data from the SAS session to the R session is ExportDataSetToR(SASDatasetName, RDataframeName). For example, the following code sends the “SASHELP.class” SAS dataset to the R session, and it becomes a dataframe with the name “df”. And then the R code prints the dataframe in the R session. The R output is then captured and printed in the SAS session.

The SAS function to send data from the R session to the SAS session is ImportDataSetFromR(SASDatasetName, RDataframeName). For example, the following code loads the built-in R data “mtcars” in the R session, and then send it to the SAS session, and it becomes the SAS dataset “sas_data” in the Work library.

Put these two pieces together, a simple workflow can look like this:

Setup SAS to Run R Code inside SAS

If you have a modern version of SAS and an up-to-date version of R on Windows, you are mostly likely to be able to run R code inside SAS. All you need to do for the setup is in the following two steps:

First, tell SAS to launch with the R functionality turned on. You can achieve this by either:

  • a) add -RLANG to the SAS command line, or
  • b) add -RLANG to the end of the SAS config file (sasv9.cfg).

Second, tell SAS where to find R. You can achieve this by either:

  • a) add options set=R_HOME='C:\Program Files\R\R-3.6.2'; to your autoexec.sas, or
  • b) add -SET R_HOME "C:\Program Files\R\R-3.6.2" to the end of the SAS config file (sasv9.cfg).

Please change the R path according to the R version installed on your machine.

Now, you can run proc options option=RLANG; run; to check whether the R functionality has been turned on in SAS.

In summary, if you prefer not to change the SAS config file (sasv9.cfg), you only need to do the following two things:

  1. add -RLANG to the SAS command line,
  2. add the following two lines to your autoexec.sas:
    • options set=R_HOME='C:\Program Files\R\R-3.6.2'; . Please change the R path according to the R version installed on your machine.
    • proc options option=RLANG; run; . So that you can always see whether SAS is ready to run R when SAS is launched.