fbpx

Data Science with Python or R?

Data Science using Python or R

Data Science with Python or R?

Data Science with Python or R – Which is better?

Harvard Business Review rated “Data Science” as the sexiest job of the 21st Century. According to the recent study by LinkedIn, Data Science and Machine Learning engineers are the job roles which are in demand. NASSCOM has predicted that India’s Analytics market will be worth over $2 Billion, creating over 200,000 job openings, only in India. When the demand is this high, it becomes obvious that students should be focusing on this Technology. But then again, a question pops up. Where to start? And What should be my base – Python or R?

Before we begin with the comparison, let us make this clear to you that we believe that both of them do an excellent job. End results of both Python and R for Data Science is the same. It becomes ones choice, just as like “Which phone to buy – Samsung, Apple or OnePlus”. They all do pretty much the same task, one has its advantages and disadvantages. In this post, we will be comparing Python with R in the field Data Science.

Let’s first understand what is Data Science

data science img From the figure that you see, there is one prominent word that you will notice first and the rest is in the background. That’s exactly what my definition of Data Science is. Confused??? Let us take this example. Suppose you have decided to buy a new Car for your family. So where do you start from?

Note: In this one example, you will see so many uses of Data Science.

You will obviously first fix a budget in your mind and then go online to look for the car in your budget, say, “best car within 15 lakhs in india”. You come across many websites on the Google’s Search Engine Results Page. Now, here comes the first task that you do online – “Which website to Choose?“. Now assume you clicked on the first link that appeared on the search page. You browsed through the website but didn’t like what the website had to offer you. And so you closed the tab. Now you visit the second link from your previous search page. And to your surprise, you see the ads from the 1st website appearing in the sidebar of the second website. How do you think, this happened?

Let’s assume you liked the second website as it shows you user ratings, recommendations which other users have provided based on similar likes and you are also able to compare 5 cars of different brands, at the same time. How do you think this is happening now?

Very simply put together, its Data Science. Combination of Computer Science, Machine Learning, Mathematics and Statistics, Data Processing and Mining, Engineering and Technology, Complex Data Analysis, Visualization along with some Domain knowledge.

Python or R?

Python vs R

Which is Better? Both R and Python are fantastic tools. Under certain circumstances, R is better and under some, Python is. But which is the right tool for you? This is a very important question for you, because at the end of the day, we need to make the data more accessible to more people. We are living in a data driven world.

Let’s first start with their background. Both R and Python are open source tools, meaning they both are free tools. They can be easily downloaded from the internet and have thousands of library support that makes our life easy and our work automatic. They are much cheaper compared to Sass, MATLAB and IBM’s SPSS in not just the initial cost but also the other libraries that you end up purchasing as these are essential libraries to work with.

The Front End

robot with tv

With python, once you have completed your analysis, you can convert it into an application using (say) Django or Flask (These are web frameworks). The applications become very flexible and they can be made very user friendly. But this is not so very easy, as you also need to know few other languages / technologies like HTML, CSS, JavaScript, Responsive programming etc. This makes it a little bit complicated, as only professionals could then make the best use of Python and it also becomes time consuming. Whereas, when it comes to R, you can make very easy, quick and reactive web applications using shiny, shinydashboard or flexdashboard. You can make your Markdown documents with your R code embedded in it. You don’t have to know web development or depend on a web developer (additional expenses can be saved). Thus making it a preferred choice. But, it does have a drawback. It is not flexible for changes or user experience. Its options are limited in comparison to its rival.

With Machine Learning

Python has a stronger grip when it comes to Machine Learning. Although R supported Machine Learning libraries much before Python did, but most of these libraries were made by individuals and so they are fragmented. Python on the other hand has sklearn library which has pretty much every algorithm that machine learning model needs to get the job done. It’s a single cohesive library on which newer models are also being developed. This becomes a key point as Machine Learning has become a major choice in field of study in the last 5 years. Python, currently takes a lead here but R community isn’t doing nothing. They are constantly working on porting these libraries and making advances with the newer libraries too.

Big data download

So another obvious choice remains is to generate our own data-set using programming language. The regular programming languages are ineffective in dealing with this requirement. Python on the other hand is very efficient in communicating with the servers using APIs. It has many libraries that support such fetching and storing the data for analysis and predictions. It makes it easier to go through websites that deal with live data and fetch the necessary information and store it in database. This information can later be analysed using R and/or of course using Python. Python can also be used on cloud computing infrastructures such as Google, Azure, Amazon EC2, etc.

Production move

production environment move

Next metric is feasibility to move the code to production. On one hand python is more of a tool for the software developers who have better accessibility to move the code into production, R on the other hand is a tool most likely used by the researchers and statisticians. They do not find a use of moving the code into production. Most times they use it in the form of scripts. Having said that, it is possible to move the R code into production using Docker Containers.

Ease of use

Python and R both are easy to use. Python is more structured in comparison to R, making it easy to comprehend. R on the other hand is unstructured and very forgiving, meaning, there is no one way of writing a script. You can write it in number of ways and still have it worked perfectly. This makes it easy for the beginners but that may not be the standard way of coding. This could put you in a soup while working in the industry.

Documentation help

documentation

Python and R have very detailed and extensive documents, which are constantly updated. So quite frankly, we feel both are very good when it comes to documentation help. R in one off case wins the battle here as the documentation for R has lot of examples to relate to, which makes it easier for beginners to understand. You can directly copy and use these examples for practice. Python on the other hand does not have examples to relate with and practice.

With Database

SQL and Database is probably one of the most important aspect when it comes to Data Science. It’s all about data and we need to be good with these concepts in order to fetch the data and work with it. That means there is a need for you to learn SQL. But what if we say that with R you need not learn SQL explicitly. R has database connectors that allows you to write a regular R code and access the databases directly. With python you will have to learn SQL and how to run it using python commands. This may not be necessary at this point of time, but we still feel learning SQL can never be a waste of time or money.

Visualizations

Most people think that R does much better when it comes to visualizations as it has direct integration with powerful tools like Tableau and Power BI. But this is not 100% true. Python can create power visualizations using matplotlib, seaborn, Bokeh, pygal, plotly and Data Visualization using Tableau as well. But of course it is not as easy as its counterpart can be.

Conclusion

Data Science with Python or R is purely your choice. If you’re planning to be software developer and work in teams, then Python seems to be a better choice. But if you’re a researcher or an analyst or if you’re a somebody who faces business development and not software development, then you should consider R. R is more for people who are looking for a result and not really concerned about how good does it look, or those who doesn’t want to learn a lot of other tools just to create a simple front end. Python on the other hand is tool that doesn’t like loose ends. Its very flexible, integrates with every other technology in the world and is super easy to learn. It has better support with Machine Learning library and does pretty well when it comes to Visualizations too.

You can learn both Data Science using Python and Data Science using R from us.

Leave a Reply