Big Data and Hadoop- Drivers of Future

Big Data & Hadoop are the buzz words now days. Every service provider is positioning itself as a champion in Big Data & Hadoop space. Organizations are finding it very difficult to understand & make it a workable solution because of the “infancy stage” and “complexity” involved.

Let’s start with answering a fundamental question “does my organization really need Big Data solutions?”  To answer this, one needs to understand the organizations’ current data, analytical practices and critically evaluate future requirements. Are they finding the current data, analytical processes & systemsinefficient to deliver results in fast and actionable manner? Or will they become inefficient and insufficient in near future due to fastgrowing requirements and new sets of data challenges like web-log, social media, and audio/video data? If answer to any of the above questions is yes, then organizations need to start thinking about Big Data strategy and road map seriously.

The first baby step towards drawing Big Data strategy is to understand it from IT/Data and Analytical point of view. Organizations need to decide if this Big Data strategy is an efficiency booster or path to new capabilities/discoveries or both. At the foundation level, either of the goals will require investment in IT and skill sets. The Investment in IT may be controlled using Apache Hadoop and other open source platforms but training and skill development is surely going to be an ongoing journey.  To achieve latter part of the goal (new capabilities/discoveries) organizations need to develop the Big Data strategy not only from IT point of view but mainly from business & analytical point of view. It’s like a baby learning not only how to walk but also where to head…..

If we see the market land space of Big Data and Hadoop, there are numerous player and they are providing solutions to different aspect of Big Data deployment.  These players can be categorized into a few broad categories.

  1. There are players who focus on technology development like Hbase, Hive and Hadoop HDFS etc.
  2. The Second category is that of Service Companies providing IT solutions like setting up Hadoop platform etc and possibly helps Integration between legacy system and Hadoop.
  3. Third category is of companies is in the space of development of framework and application that run on top of Hadoop.
  4. The Forth category of companies is the BI outsourcing companies that provide value by running BI jobs with shorter turnaround time.
  5. Highest value comes from another category of companies that use Analytics/data mining to generate new insights from the huge amounts of data, hitherto not feasible on legacy RDBMS systems.

Affine Analytics provides solutions into two of the above categories. One is using BI atop Hadoop and second is performing predictive analytics on the data which were huge and unmanageable in the past and also discovering hidden patterns. We use Big Data to take predictive analytics accuracy to next level deploying Machine Learning Techniques and other advanced techniques, made feasible by Big Data.

At Affine we are developing Big Data capability in literally a big way.Besides having practically every analyst trained on Big Data platform, wealso have an in-house Hadoop Analytical Lab “Hal”(A Hindi word meaning solution)which increase the efficiency of our operations multifold and gives us the capability to mine unstructured and semi-structure data to generate new insights which helps our esteemed clients’ business take faster and better decisions.

Affine is currently working on the Telecom CDR data to generate insights which were not known earlier and to improve performance of existing strategies. It requires mining of huge amount of data using Big Data platforms like Hadoop & Hive. Affine is trying to use call data records or voice/data transaction data to better manage churn, come up with better strategies to increase ARPU, Increase usage of VAS etc.

Ashish Maheshwari

Director – Client Delivery, Affine Analytics

For feedback and comments I can be reached at ashish.maheshwari@affineanalytics.com

Advertisements

Affine Transformations 101: The Analytics Scientist Spiderweb

Analytics is the buzzword these days. Businesses are increasingly realizing the need to use analytics, or for some, even the need to be seen using analytics. A recent article in the Harvard Business Review identifies Data Scientists or Analyticians to be having the Sexiest Job in the 21st century.

But what does it take to become a good analytician? How is their DNA different from the rest? Do they eat differently / follow a separate exercise regimen?

At Affine Analytics, we believe we have identified the secret ingredients to creating successful analyticians[1].

 Image

Business Knowledge – As George Clooney rightly said in “Up in the Air” –“Before you try to revolutionize my business, I’d like to know that you know my business”, one should not approach a problem without having a proper knowledge of the business. It is utmost important to appreciate the “why and how of a business problem” and one should do a proper ground work before embarking on the approach. Every business is unique in its own way and needs to be understood thoroughly before attempting to solve the problem. Do not try to blindly fit the hypotheses learnt from one problem to the other. The more you know, the easier it is!

 

Problem Engineering – Don’t engineer a problem, but engineer a solution to an existing business problem.

There exists a misconception that analytical solutions necessarily require applying new and advanced statistical procedures. Not always true: instead what is always needed is the ability to take a business problem in its rawest format, break it down into logical pieces and then view /solve each of the pieces in a systematic manner.

Common sense coupled with an understanding of the business context is the first & foremost requirement to get started on the road to become an impactful analytics scientist.

The next logical step in solution engineering is to find creative solution approaches. To do that, you need to keep in mind the end goal of what is required to be achieved. Don’t have the door-to-door salesman approach selling your standard product. Find what the end goal is and create a solution that achieves it. Sound knowledge of statistical techniques is needed here. Superficial knowledge can help you in coffee table discussions, but doesn’t work here. A deep understanding of the pros / cons of each method can get you to the optimal solution approach.

Innovation is also a key driver here.  It is required not just at the apex of the hierarchical pyramid, but at the lowest level. Innovation can range from creating an automated business suite for a retailer to a completely different way of creating a variable or a metric.

 

Curiosity & Skepticism – Necessity is the mother of invention, said Pluto, but it is more curiosity than necessity. Of late, new things are born just because people are inquisitive. Whenever a person is faced with a challenge (Something one hasn’t seen or solved before), the curious devil in him wakes up and doesn’t sleep until he’s become an expert on that topic. Curiosity makes one productive and work becomes fun.

Managers & leaders, take note: curiosity can create the passion or more commonly used (/abused) “fire in the belly”.

Analyze the approach from various different angles to increase confidence in your results. The more you look into the problem, the more you will get out of it. Be critical of your own findings, and use multiple approaches and techniques to verify unintuitive results.

 

Math – Mathematics forms the basis of the analytics industry and every data scientist is expected to have a good grip on the subject. Mathematical skills especially appreciating numbers in general and variable trends are more important in the field of analytics than knowing machine learning techniques. Master the basics of all the techniques and you will rule the world. The basic understanding of the principles governing numbers changes the way one looks at a variable.

 

Technology – If math forms the foundation of analytics, then technology enables us to construct a proper structure out of it. One can’t give meaning to the math behind, unless he knows the right tools. Great analysts and even managers cannot work without these tools. Knowledge of MS-Excel and R is must in this field. Though technology is evolving rapidly, one can deliver high quality analytics with proper knowledge of these two tools.

 

Writing ­- Your story, however great it may be, is worthless, until you are not able to sell it. Presentation forms the key to create an “impact” of an analysis. If people cannot make out any meaning out of your presentation, then your analysis holds no value. If the client cannot understand, forget about them implementing it. Create decks/reports that make proper sense. Simple decks / reports with systematic storyboards are what create the impact. If someone stumbles upon your deck in the future, and is able to understand it without much background –then you will have carved a good analytics scientist out of you!


[1] Affine’s leadership contains a concoction of a few decades of analytics experience in analyzing diverse problems across multiple business functions & industries

Vineet Kumar & Krishna Agarwal

Process Orientation in Analytics.

Before you go on to read this blog in entirety, let me ensure that we are on same page on the very definition and boundaries of Analytics. To me analytics is a way of problem solving that relies predominantly on one thing – looking for repeatable, reliable and meaningful patterns in data to better understand the problem at hand – and hence developing a decision and action strategy, whose outcome is more predictable. I’m not getting into defining, or segmenting analytics itself into multiple types – that’s for another post.

If I may assume, in a rather non-analytical manner, that the hypothesis stated above that “Analytically driven decisions have more predictable outcomes” is true, then we have a solid case to ask the next question – “If Analytics is so important, should not all decisions be driven by analytics?” I’m convinced and I’m sure most of you are too, that corporations around the world already agree that analytics does deliver more predictable results and hence is critical for growth and sustainability.

It leads us to the next big question – “If Analytics is so important, or rather mandatory, then are organizations developing their analytical capability in the right manner? Are they performing various analysis in a manner that is consistent across time and space (read business functions)?” The answer, if not a resounding no, is at least a muted acceptance of lack of maturity from a vast majority of consumers of analytics.

Answer these to convince yourself – “Does your marketing team know what aspects of a product did the R&D team find important in the Conjoint Analysis they did two years ago?” Sample another one – “Does your analytics manager know the hypothesis tested by his/her predecessor and more importantly the ones he/she rejected?” Another one – “Are you sure the vendor you outsourced your analytics project to or for that matter your internal analytics team has done it in the right manner?”

Unfortunately while each individual project is done with great rigor and in most cases the end result is also great, the rigor, the process, the methodologies change from analyst to analyst and from analysis to analysis. This “adhocism” engenders uncertainty and a colossal waste of synergies across analyses.

It goes without saying that there is immense value in having a consistent way of performing various analyses across time and across functions. Some of the value adds that come to my mind immediately are, higher degree of reliability in results, consistency in interpretation of results by different functions, greater amount of cross learning owing to a common shared code of conduct and the availability of insights well post the project completion.

However the one single benefit that stand tall above rest is reduction in errors – given the magnitude of the decisions taken based on analysis, the cost of error could be very high.

Quoting an experience at Affine Analytics with campaign analysis project for a large internet company – Given the number of campaigns they run, the number of response models to be made was rather large (one for each “type” of campaign). Taking cognizance of scale and the need for repeatability, we first defined a framework or sorts (still keeping in mind the flexibility we need for individual models). Using this framework, we were able create huge efficiencies (40-50% reduction in model development time from 1st set of models to the third set) and also ensure error free delivery from the first model till the last one.

That said, there is a definite need to be wary of bureaucratizing processes. Overdone, it will end up stifling creativity, which is equally important in the development of analytics solutions. Organizations  need to strike a fine balance between consistency and creativity by designing & following processes that are minimalistic in nature but comprehensive in assuring quality.

In my next post, I will start looking back at the analytical exercises I have been part of, primarily in the last two years at Affine, and see where process helped in making it more meaningful, reliable and robust. We will also briefly talk about aspects of Affine’s own analytical framework A4.

To end I quote John Updike : “Creativity is merely a plus name for regular activity. Any activity becomes creative when the doer cares about doing it right, or better”

Manas Agrawal

CEO

 

Affine Transformations 101: Overpowering the Predictive Power Greed called “Overfitting”!

The term overfitting originates from the way predictive models are built – they are “fitted” to match the historical data. The fit can be poor – called underfitting – in which case the predictions are far away from most of the actual data points. Or it can be too close – called ovefitting – in which case, we are also force-fitting the noise rather than capturing the true underlying structure. As obvious as it may sound, many analysts/forecasters completely ignore this problem and hence develop not predictive models but chaotic models.

Overfitting usually happens in cases when the data is limited and noisy, but the main (de)motivation behind building overfitted models is the urge to build super-predictive models. Combine these ingredients and you have a recipe for predictive chaos.

Here’s a simple enough example to explain overfitting

Image

An overfitted model will score high on various statistical tests and measures, but it scores those extra points by cheating – by fitting the noise rather than the true underlying structure. This might make it easier to sell the model to the client, but has the potential to hurt their business.

At Affine, we perform multiple diagnostic checks, both during model training and testing phases, to ensure that our models are overfit-free. Our in-house multi-tier validation framework leverages bagging (boostrap aggregation), where the models are built as well as validated on multiple boostrap samples (pulled with and without replacement). The no. of bootstrap samples may var from 20-50 depending on the underlying statistical model and samp size amongst other things. Multiple model parameters and performance metrics are validated for consistency across these bagged samples and summarized to create a final validation report that lets our analytic scientists take a call on overfitting as well as take measures to get rid of it.

‘Overfitting Diagnostic Check’ is just one of the many checkpoints through which we build fair, robust and more importantly, business-ready “longer shelf-life” models for our clients. What motivates our analytic scientists to perform these checks, which other may call overheads? Simple – skepticism, curiosity and a mindset which forces them to be critical of their own work to achieve continued excellence.

Watch out this space for more Affine Transformations…

~VK

Data Analytics : The Next Big Thing

Gone are the days when ‘Management Consulting’ was considered to be the epitome of knowledge and skills, the days when your experience and gut feeling were enough to solve complex business problems and companies were ready to pay hefty bills for your service (the 10 slide presentation at the end of the project). Today we know consultants as people who save their clients almost enough to pay their fee. I always had this idea in my mind that consulting was about answering business questions through analysis. It was supposed to be Excel sheets and models, sifting through data to discover profit and loss, and helping clients make decisions that would add the most value for themselves, and by extension, society.

Today we can see that ‘Business Analytics’ is the emerging and fastest growing technology which every organization is embracing. As per Gartner’s prediction, by 2014, 30 percent of analytic applications will use proactive, predictive and forecasting capabilities, and the software market for business intelligence, analytics and corporate performance management grew by 13.4% in 2010 to $10.5 billion and would continue to grow. It can be startling to hear how much data has been created and collected in the last few years. For instance, 15 of the top 17 industry sectors in the United States have more data stored per company than the Library of Congress (which has collected 235 terabytes of data). The growth of external, social and unstructured data has been even more rapid. In fact, 90 percent of the entire world’s data was created in the last 2 years. All the data is there, and it is useful if we can harness it in the right way to increase our collective knowledge. Big data has become the new paradigm of knowledge assets.

Now more and more organizations are embedding Analytics and just like Finance, HR, Sales, Marketing etc. organizations are creating departments for Analytics! This presents an exciting prospect for all those who are at the vanguard of this revolution, to help create functional units in companies and create systems and processes that would cut across departments and enable companies to utilize the full power of data.

The best part of Analytics as a profession is that depending on your ability and interests, you can choose to pursue a more technical based career path or a more consulting (converting data to Insights and Strategies) based career path and each of them is equally rewarding. Analytics is a sublime intersection of Science and Art, and with each progressing day, as data processing continues to get faster and faster, it offers exciting challenges and ensures that the learning curve remains steeper than ever. Just when you think that you know all about Analytics, there is always something new which comes up to take your complacence away. Just when you thought that SQL/Oracle was the way to go in data storage, newer, faster and mightier data bases like Netezza, Terra Data emerged.  ‘Hadoop’ is not a character out of Scooby Doo, but a fast emerging technique in the world of Big Data and distributed computing. Knowing Logistic regression and CHAID is no longer enough as you have ‘Random Forests’ to deal with. “Nearest Neighbor” is not the person in the cubicle next to you, but a fast emerging technique in the world of Machine Learning.

All in all the future for business analytics is bright and there is lot to explore and learn in this wide field full of new technologies and exciting roadblocks to keep your mind running at full throttle.

Good times at Affine !!

Well your wait is over..I know you guys have been eagerly waiting for my next post and here it is :P. For the last two blogs I have been bragging about my life and all the other stuff. Enough said and done about the transition from college to corporate life and what to expect from a startup, today’s post will tell you how amazing working at a startup is and how joyous the experience at Affine is.

How do you dream your workplace to be like? Let me tell you what I want, I need a workplace that I can become a part of, a workplace that lets me be myself and lets me share a connection with my work and my co workers, a workplace where my co workers can become my friends, a workplace that doesn’t confine me or limit me, a workplace where I can use the fun I have as energy to spend late nights at the office. In my opinion, this place is a startup. For me, knowing that I’m working hard with and for the people around me makes working for long hours easy. All of us here at Affine are working hard to stir things up, and it feels great even when sitting behind that desk. During my short stay at Affine I have had a chance to work with a few people and to know them at a deeper level and let me tell you that the kind of maturity shown by many of them during demanding situations showcases the talent pool of the firm. It also assures me of the fact that this startup is going to go a long way. So catch hold of the bus and enjoy the ride.

Coming back to the Boss’s surprise birthday bash from last week, Anjana (HR), Alex (Consultant with an awesome bike :P) and we freshers a.k.a fachhas played a crucial role in arranging all the stuff ranging from cake, balloons (Yes, we did use bandage to tie the balloons!! :P), momos (Boss’s favourite), a huge birthday card (with special messages from everyone) and to provide the icing on the cake, a birthday gift like none other, a McLaren F1 car model similar to the one driven by Lewis Hamilton (again Boss’s favourite driver). We even had the Chipmunks playing Happy Birthday for him. The joy on Vineet’s face after receiving the gift was similar to the one when you get your 1st cricket bat or your 1st cycle :). Our efforts to please our boss didn’t go in vain as we were rewarded with a birthday treat the next day.

\m/ Got my 1st car 🙂

With projects coming at a fare pace and everyone from the business analyst to the delivery manager putting full effort into producing the best possible results it seems that the future is bright. For me personally, at 11 pm in the night sitting in the balcony of my house gazing at the stars and thinking about the last two and half months at Affine and Bangalore, I can say from the bottom of my heart that it was worth giving a shot, it is worth living and experiencing and I have learnt a lot in this period.

Kudos to my colleagues at Affine for making this a treasured experience :).

Startup Job: Are you cut out for 1?

What is a Startup?……

Ahaaa found the answer!! A startup is a human institution designed to deliver a new product or service under conditions of extreme uncertainty (so much for the literary geniuses :P). Many call it their “Baby”, and to be really honest it’s a “Baby’s Day Out” scenario in a startup, full of bumpy roads and anxiety (even at 2 in the night). As someone rightly said – “Things don’t happen automatically, you have to make them happen. You have to endure and enjoy the pain. Successful entrepreneurs know it instinctively & that’s what makes them tick.”

You need three things to create a successful startup: start with good people (like Raja Sekhar :)), to make something customers actually want (analytics as a field is roaring), and to spend as efficiently as possible (our founders know exactly how to do this). Most startups fail because they fail at one of these. A startup that does all three will probably succeed. So Affine has all these ingredients and is on the right track to succeed. Limited hierarchy and flat organization means the founders and decision makers are close to the troops at all times and when you see your boss sitting beside you doing work, it gives you a sense of belonging and pride (it also means no facebook in office :P).

In the pre-bust era of Dot Com Dominance, it used to be “cool” to say that you worked in a startup. Though much of that romance has worn off, there are still a lot of advantages to working in a startup. Let me point out a few as per my experience:

• An employee has to do a lot of multitasking. He is expected to go out of his comfort zone and take on tasks in which he has no significant experience. The employee gets exposure to various business functions and enhances his core skills with a lot of transferrable skills.

• Think carefully before joining a startup, startups are not for the weak hearted. Your offer letter will off course say 40 hrs/week, but you will be expected to be around day & night. This works to your advantage as you get to learn at a greater pace.

• Startups are best for folks who are comfortable with a degree of flexibility and autonomy. If you are a person who thrives on challenges in life, a startup environment offers it in truckloads.

• It’s important to remember that you are an important asset to the organization. Your boss/HR manager cares about your well-being and overall happiness. If there’s something wrong, they want to help. Don’t be afraid to book a coffee meeting to share your concerns.

Working for a startup is more than just a job – It is a way of life. It will transform everything from the way you think, the way you approach problems and the way you execute stuff.