Perform Big Data Analytics with Cloudera Impala and InfoCaptor

InfoCaptor now officially works with Cloudera's Hadoop distribution and specifically with Hive and Impala.

Once you get connected to either Impala or Hive you can perform drag and drop visualization just like with any other data source.

InfoCaptor adds native Impala and Hive functionality within the Visualizer so you can leverage Date/time functions for date hierarchy visualizations.

Read the details about InfoCaptor does big data with Impala and Hive

Comments Off

VC Investment Data visualization and analytics

Using the data from pwcmoneytree.com and easy to use dashboard software we perform analytics on a huge dataset that spans 20 years of Venture capital investment data from 1995 onward. Having data that goes far into the history should give us enough to extract the necessary analytical juice out of it.

 

VC investment by industry

Change in investment pattern between 2000 and 2014

The year 2000 was definitely the peak for VC investment craziness. A whopping 105 Billions was pumped into startups and bringing them quickly for IPO. Ever since after the crash of 2000... Continue reading the original article

Comments Off

Trends in Big Data, Hadoop, Business Intelligence, Analytics and Dashboards

How has the interest in Big Data, Hadoop, Business Intelligence, Analytics and Dashboards changed over the years?

One easy way to gauge the interest is to measure how much news is generated for the related term and Google Trends allows you do that very easily.

After plugging all of the above terms in Google trends and further analysis leads to the following visualizations.

Aggregating the results by year

Image

 

It is very amazing to see that the stream representing Dashboards has remained constant through out the years.

So does the stream for Analytics and Business Intelligence in general exihibit similar trend.

Analytics is kind of widening its mouth as we move forward and that is being helped by the combination of terms such as Hadoop + Big Data + Analytics being used almost together.

Now check the line chart below

Image

 

Looks like the Trend for Dashboards define the lower bound and the trend for Business Intelligence define the upper bound. The trend for Hadoop started around first Quarter of 2007. The trend for Big Data started around third Quarter of 2008 and ever since they both are rapidly increasing. It remains to see whether they will cross "Business Intelligence" in terms of popularity of kind of merge and find a stable position somewhere in the middle.

Before Big Data and Hadoop came into picture the term "Analytics" exhibited a stable ground closer to dashboards but now the trend for Analytics seems to be following Big Data and Hadoop.

Let us take a deeper look into each week since 2004

Image

 

Look at the downward spikes occuring around Christmas time. Nobody wants to hear about Big Data or Dashboards during holidays.

And finally, here is a quarterly cyclical view

Image

Click here to view the full interactive Visualizations

Comments Off

Auto Sales Data Visualization by Manufacturer

Data: Edmunds

Image

 

Top Manufacturer

Image

 

Quarterly breakup of units sold by manufacturer

Image

 

View the interactive visualizations

Comments Off

Holiday Sales by category

Image

Comments Off

Visualization on How the undergraduate tuition has increased over the years

Average undergraduate tuition and fees and room and board rates

Source: http://nces.ed.gov/

Image

These figures are inflation adjusted and look how just the tuition fees have increased compared to the Dorm and Board rates

Now comparing the rate increase for 2-year program

Image

So for the 2 year program, the board rates have remained at the same level compared to the dorm rates.

Now check out the interesting graph for 4 year program below

Image

 

Comparing the slope of 2 year Board rates to the 4 year Board rates, the 4 year has significant increase

Image

If price of meals is same for both programs then both 4 year and 2 year programs should have the same slope. So why is the 4 year slope different than 2 year?

Now, let see about the Dorm rates

Image

 

And finally the 4 year vs 2 year Tuition rates

Image

Here is the data table for the above visualization

Comments Off

USA War Casualties

iCasualties.org maintains documented list of all fatalities for Iraq and Afghanistan wars.

Analysing the dataset for Afghanistan, we summarize the results by the year

NOTE: This contains only Afghanistan metrics. We will later update the visuals to reflect Iraq war.

 

Image

USA war fatalities by year

We are approaching the levels of 2002 and hope for the best that we don't have to suffer another wars.

Here is another view by year and month

 

 

The dataset contains the age of each person died in the war so summarizing by Age

Image

War Deaths by Age

Checking it against the year

Image

Why so many young deaths between age 20 and 30 for the year 2014?

Image

Where did most of the deaths occur?

Image

 

Where were the soldiers from?

Image

Deaths by Rank

 

 

Cause of Death

Attack Types

Image

Image

Image

Helicopter Crash is the one of the top death cause in Non Hostile situations

Comments Off

Fastest growing and rapidly declining job industry

Data source : http://www.bls.gov/emp/tables.htm#occtables

 

Fastest growing job industry

Image

Original Visualization

Most rapidly declining job industry

Image

Rapidly declining jobs link

Comments Off

Top 100 analytics companies ranked and scored by Mattermark

Let us move on from Grass Eating Sauropods and talk about who's who in the analytic space.

For every dime there are dozen analytic companies. Everybody who provides a freaking dashboard is an analytic company. Anybody that merely mentions Google, Facebook, Hadoop etc in the same sentence is somehow into BigData. Haven't you stumbled across company pages where they claim to be expert in analytics and big data but they want you to schedule a call with them. They don't have any products or solutions to show case yet they are Big Data/analytics folks.

So to make things easy, Mattermark released this highly curated list of 100 analytic companies. No offense to BigData, but small datasets like these are always juicy.

Image

 

Mattermakr ranks each company using their own algorithm and calls it "Mattermark Score". After loading it up, we came up with these visualizations

 

 

 

For each funding stage, it shows the listing of companies by Mattermark score.

Some interesting questions

1. How many companies by funding stage?

Image

2. What is the funding by location and stage?

 

 

Another interesting visual by plotting the score against the total funding.

Image

 

We thought the above visual would tell us what kind of logic did Mattermark used to rank the companies. As suspected, apparently we cannot reverse engineer it without some additional information about the companies.

Comments Off