Monday, April 28, 2014

The Deep Dive: SAS vs. R

Last month I conducted a quick “flash survey” of my network to quantify the preferences of the Burtch Works network, and asked: Which do you prefer to use, R or SAS? I posted the initial results on my blog a few weeks ago, and as promised during the webinar for our Data Scientist Salary Study, we've finished up our deeper dive analysis of the data from over 1,000 respondents. Whether you think the results are predictable or surprising I’d love to hear your thoughts in the comments below, as the conversation has been pretty lively so far. Without further ado, here are the greatly anticipated results!


As many theorized, respondents with five or less years of experience were the most likely to favor R.

Although SAS was favored by most education levels, amongst PhD respondents R was almost even with SAS. In looking at PhD respondents with five or less years of experience, R was decidedly more popular than SAS.


















In most regions of the United States SAS was the preferred tool, however on the West Coast R is favored over SAS.






















In almost every industry SAS is preferred over R, except for Tech, Telecom and Gaming companies. Retail, Pharma/Healthcare, and Financial Services have the strongest preference for SAS.

23 comments:

Dave Wilson said...

Hi Linda:

Very interesting results. It looks to me that younger, less experienced individuals tend to favor R about the same as SAS. Otherwise, SAS still rules.

Again, I think this boils down to the deep learning curve that both products have and the unwillingness to devote the extensive time needed to learn another statistical package.

I do not know R, but as you know, the beauty of SAS is the data management and manipulation capability that the product also offers prior to any analysis that has to be undertaken.

Unknown said...

I have no idea about SAS, but R has every package and every possible machine learning technique.

I prefer R, but at times i choose python.

I heard that SAS will soon integrate with R, if that is the case.. R is the best :)

Anonymous said...


David,

When people went through university courses in statistics a decade ago, they were probably taught about statistical computing in SAS. Today, most top universities teach R in their introductory statistics courses. So your idea that R users are simply "younger" and less experienced is a little more complex then what you state.

Also, I'd be interested the distinction between between who do complex analysis versus those who don't, and which of those groups prefers. My guess is that if you're doing basic analysis (correlations, summarizing data, etc) you'll be more prone to use SAS while people performing generalized linear model, classification models, and so forth are using R.

Unknown said...

As I am currently studying both SAS and R, one of the first things that jumped out at me was that SAS is much more of a reporting tool than R while R is much more suited to quick, ad hoc analysis.

I like both, but as a manager I would make different tool choices for my staff in different situations.

It might be nice to segment those users who use the tools primarily for reporting versus those who use them for hands-on analytics.

Anonymous said...

Ah, but was Excel used to generate those graphs?

Anonymous said...

In respect to saikumar allaka comment, SAS has added an R node to Enterprise Miner, so you can compare models within SAS, as well as some R capabilities in SAS Model Manager to help improve the model deployment and model scoring process. And SAS has recently announced FREE software to higher ed students with an easy-to-deploy multi-platform VM (so they can install it on their MAC/PC/linux machine).

Bharath (BV) said...

We have to take into account the cost of licensing SAS vs. R which is free open source as well.

Anonymous said...

Agree w/anonymous' comment above re:education. Careful to conclude which is better by majority. You can't conclude that those with more experience will "graduate" to SAS from R. Rather R is more commonly taught than SAS now. See email vs. social media argument for communication in the workplace. Email is dying but those who started their careers in email tend to hold on (like me)...

Anonymous said...

SAS is legacy software in most large corporations, which incidentally happen to be the only organizations who can afford the license. That would be my guess why most people still use SAS.

Personally, I would take R and a good SQL implementation any day over SAS.

Willie Liao said...

@Dave Wilson

I interpret the years of experience bar chart as "Business Analysts use SAS and Data Scientists use R". Or you could look at the industry chart and say R users are "more experienced" in other open source tools like python, julia, vowpal wabbit, hadoop, hive, pig, mahout, spark.

In any case, it's all open to interpretation since we can all agree that this "deep dive" is fairly shallow as far analysis goes. Probably neither R nor SAS were used. :)

Steve Iaquaniello said...

How many people have switched from SAS to R and when? Using myself as an example, I was a 100% SAS user for almost 10 years, but almost 3 years ago the company I was with at the time decided to switch from SAS to R for purely financial reasons. I had some experience with R from my academic work, so I made the switch without too much pain. Subsequently, I've worked for two other companies that only use R, and wouldn't even consider paying for a SAS license. It seems like companies are concluding that SAS and R are comparable enough that SAS isn't worth the additional cost. And in the marketing/consulting world, it's nice to be able to use R with clients and not have to charge additional fees for SAS licenses. I'm curious if this is a trend, or if I'm an outlier?

Michel said...

I guess SAS will be out of business soon if they don't reinvent themselves.

The younger, tech savy and educated you are the more likely you prefer R.

Anonymous said...

I feel that individuals that actually know both are the minority. Neither are easy to pick up, and they're usually comparable in results so why bother? Those on R's side like the cheaper cost, and those on SAS's side like the longevity and stability. It's really a toss-up, but seeing R jump out to take away some of SAS's near monopoly is quite interesting.

Unknown said...

I suspect that more senior people prefer SAS because that is what was widely available when they started. However, with the free cost of R, large library of packages available, and the fact that it is taught at universities likely explains the difference in age groups.

Jonathan Poeder said...

Good luck meeting that $MM SLA when a critical bug is discovered in your 'community supported' R package. Not sure the client is going to give a crap about open source philosophy.

Zitrun said...

This sounds about right. R was not a major player until recently. It is largely preferred by analytics around big data enviornment. As a long term SAS user who is walking away from SAS, I sincerely hope that SAS would become realistics in pricing and improving their product so they don't miss the big data feast.

Anonymous said...

I have used both sad and r. Personally, I prefer sad because it's stable, reliable and great for data management. R is good for ad-hoc analysis. Also, many people I know that use r tend to apply complex algorithm that they know little about because r offers you package for you to do so. I do believe some are real experts in those complex algorithms. However many are not. Less than 5 years of experience does say a lot about such user group.

Boyko Ivanov said...

I totally agree with the above comments that years of experience associated with preferences for SAS could be probably explained in terms of the learning curve. Also, yes, nowadays, R is the tool preferred at the universities. The cost, yes, important.

But there is also another thing: The whole world community contributes to the development of various packages in R. R covers way more fields than SAS, and, this trend is explosive. So, in my opinion, SAS will lose the battle unless they make SAS free and go out of business!?

Unknown said...

R is now lingua franca of academic statistics just as SAS was 30 years ago. Those who like SAS the software but dislike SAS the company should take a look at WPS

http://openbi.com/content/what-sas-doesnt-want-you-read-wps-31-sas-r-and-medicare

Anonymous said...

Just curious if this study took into account that R is essentially an open source derivative of S ...

Anonymous said...

The argument about SAS vs R is misguided. SAS is steadily increasing revenue year over year. Why? Because they have rolled out many complete industry solutions, full bore applications, and a the big data visualization package called Visual Analytics, which has a number of architecture configurations. SAS has also branched out into a variety of successful desktop software tools such as SAS SimStudio and JMP.
They also have built a number of enterprise deployment and model management solutions.

The thoughts I have seen seem focused on R vs Base SAS/SAS Graph/SAS Stat. I am guessing that most here left SAS OR out because they are ignorant about optimization and optimization solvers (mathematical not statical approaches).

Some of you here seem like you are behind the times, in terms of business analytics in the marketplace. As for the data science fad, it is a fad that is fading away since domain expertise is required to generate insight and action. This doesn't come out of the Silicon Valley bubble for most corporations. Look at GE's failure to be successful with their open source data science investment in San Ramon. It is an absolute colossal failure, and is being marketed with lipstick to save face.

Bob Muenchen said...

For many different measures of SAS and R market share and growth trends, see my article, The Popularity of Data Analysis Software (http://bit.ly/statpop). For a deep comparison between the two, you may be interested in reading my book, R for SAS and SPSS Users.

A couple of comments above suggest that SAS has data management capabilities that R lacks. That's not true as I demonstrate in my data management workshop (http://bit.ly/ManagingDataR). I'll be doing a shortened version of that at the useR 2014 conference at the end of June (http://user2014.stat.ucla.edu/).

However, if the data do not fit into memory, then SAS has an advantage in how much data it can handle. Such data are often in databases though, and may be better prepared for analysis by SQL.

Anonymous said...

Bob, I have your book. Love it! I disagree though with your statement here that if you don't have enough memory you have to use SAS. First, R has memory management techniques that work as well as in C and C++. And second, R is very MapReduce-able and cloud-compatible. That said though, I too prefer using SQL as a preprocess tool outside of R.