Long Query Run Time Is The Big Pain Point In Big Data Space Says Quantium’s Sonal Pingle

As part of our Theme of the Month — ‘Leading Tools And Techniques Used By Analytics And AI Practitioners’, we bring to you a splendid conversation we had with Sonal Pingle, Lead Analyst – Product and Technology at Quantium, a data science company based in Australia.

Having a solid experience of more than seven years in data analytics and machine learning, Pingle works with the Product team at Quantium, and serves clients across Australia, India, South Africa and the US, in delivering data science solutions. With an MBA in Marketing and Communications, she has strong expertise in Retail, Media and Insurance industries.

In this article, Pingle gives us wonderful insights into top tools and techniques currently prevailing in the industry.

What are the most commonly used tools in analytics, artificial intelligence and data science?

We use a variety of tools at Quantium. Some of the major ones for analytics work include Scala, R, Teradata, Python. We also use Microstrategy and Tableau for visualisation.

What is the most productive tool that you have come across?

I personally work quite a lot on the big data analytics space. So, I find Scala to be very useful. It works very efficiently in the big data environment.

Do you prefer tools that are open sourced or paid? Please elaborate the benefits of open source and paid tools that you prefer.

We use a lot of tools that are open source, at Quantium. Most of the tools that I mentioned earlier like R, Python, Scala are open source. They are easy to use since they are widely documented and have well-developed libraries. Although some teams in Quantium do use paid software such as Teradata, SQL servers and MapR, this is mainly due to the service/support provided in these software.

Is open source considered an important attribute when choosing the tool of your choice?

Not really, it is more about what you need and can the open source tool provides. If it can’t, then we opt for paid tools.

What are the most common issues you face while dealing with data? How is selecting the right tool critical for problem-solving?

Since I work in the big data space, long query run times is often a big pain point. So, selection of the right tools/languages and optimising or writing efficient queries help mitigate that issue.

How do you select tools for a given task?

This majorly depends on where the data sits and what is the most convenient option to work on it. My personal preference is to do as much work as possible on the big data cluster.

What are the most user-friendly languages and tools that you have come across?

Languages – Scala, Python

Tools – Jupyter notebooks, Zeppelin notebooks

What does an ideal data scientist toolkit look like?

Languages – Scala, R, Python, SQL

Tool – Jupyter or Zeppelin notebooks, H2O

Big data cluster or cloud access

What is the most preferred language used by the team?

Scala for big data querying.

Can you give us the percentage of data scientists and percentage of developers that use a particular language/data visualisation tool etc.?

For data scientists at Quantium:

Scala – 30%

R – 30%

SQL and/or Teradata – 30%

Python – 10%.

What is the most preferred cloud provider — AWS, Google or Azure?

We use a mix of Google and Azure depending on where the client data sits.

What are some of the tools used for scaling data science workloads; for eg., Dockers are gaining popularity vis a vis Spark?

Apache Spark is widely adopted in Quantium.

What are some of the proprietary tools developed in-house by the company?

At Quantium, we have developed extensive analytics libraries on top of R, Scala, Python, PySpark languages. This really helps the analyst leverage any previous work done and industry best practices in solving a certain problem.

The post Long Query Run Time Is The Big Pain Point In Big Data Space Says Quantium’s Sonal Pingle appeared first on Analytics India Magazine.

Long Query Run Time Is The Big Pain Point In Big Data Space Says Quantium’s Sonal Pingle

What are the most commonly used tools in analytics, artificial intelligence and data science?

What is the most productive tool that you have come across?

Do you prefer tools that are open sourced or paid? Please elaborate the benefits of open source and paid tools that you prefer.

Is open source considered an important attribute when choosing the tool of your choice?

What are the most common issues you face while dealing with data? How is selecting the right tool critical for problem-solving?

How do you select tools for a given task?

What are the most user-friendly languages and tools that you have come across?

What does an ideal data scientist toolkit look like?

What is the most preferred language used by the team?

Can you give us the percentage of data scientists and percentage of developers that use a particular language/data visualisation tool etc.?

What is the most preferred cloud provider — AWS, Google or Azure?

What are some of the tools used for scaling data science workloads; for eg., Dockers are gaining popularity vis a vis Spark?

What are some of the proprietary tools developed in-house by the company?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112