Logo

Explaining India's miserable Test cricket performance in 2011/2012

23 days ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

Today when I reminded a friend not to lose hope in Indian cricket (after recent whitewash in England and Australia), another friend commented, 
प्रसून जी , यह भारतीय क्रिकेट है ..यहाँ पर हर विक्टरी दुसरे दिन पुरानी हो जाती है ...You have to perform at your best ... After all they are getting unexpected money. They should deliver the goods as per citizens expectations...
My reply to my friend was, 
Agreed, they need to perform... And here's my explanation of our recent performance... After Aussie dominance ended a few years ago with Warne/Gilchrist/McGrath/others retiring, we've hit a period where most top Test teams are equal (Eng/Aus/SA/SL/Ind)... As a result, the #1 Test team in last 2-3 years has been the team that played most games at HOME. And I expect this trend to continue. Here's some proof: India became #1 by playing all strong team at home in 2008-10 period... So did England in 2010-11 and became #1... now look at what Pak is doing to Eng outside Eng (down by 2-0)... Aussies won at home 4-0 against us in 2011/12 but India beat Aus 2-0 not too long ago (at HOME)...



Now, don't misunderstand me... I'm not saying India's terrible performance is okay. It is NOT. Indian fans deserve to be pissed. Recent performance is terrible. Several innings defeat. No one firing. Giving up so easily. Really bad. But I'm highlighting the fact that the current years will cause joy and heart break for fans depending on where their teams are playing (Home games will bring JOY and Away games will BREAK THEIR HEARTS). Cricket fans, let us prepare ourselves for this. Our team's Test cricket performance will more or less depend on where they're playing (checkout ICC's Future Tours Program). Test cricket ranking won't mean much! Wasim Akram feels the same way.


Here are some charts that explain my point of view... 

1) See how India played more tests at home between 2008 and 2010. In 2011/12, they've mostly AWAY games. Similarly, in 2011 England played mostly at HOME





2) Notice how we see more RED (losses) than GREEN (wins) in the charts below, India and England's performance in AWAY games





3) Notice how we see more GREEN (wins) than RED (losses) in the charts below, India and England's performance in HOME games





Leadership explained from the greatest cricketer

27 days ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

Rahul Dravid said this at the Bradman Oration. It stuck with me...
One of the things, Bradman said has stayed in my mind. That the finest of athletes had, along with skill, a few more essential qualities: to conduct their life with dignity, with integrity, with courage and modesty. All this he believed, were totally compatible with pride, ambition, determination and competitiveness. 


Indian cricket - highs and lows

28 days ago | Prasoon Sharma: Enterprise Software Does not Have to Suck


Many of my friends and family members ask me how I continue to follow Indian cricket when we are doing so bad (whitewash in Test Cricket in England in late 2011 and now in Australia (well almost there))...


My answer is that there are ups and downs in sports and my love for the team and the sport isn't based on convenience, it isn't based on my team doing well all the time. Sometimes we do well (hey, we just won the World Cup 9 months ago and were #1 Test team just a few months ago) and sometimes we don't (pathetic test cricket performance recently).






I am a fan of Indian cricket. I don't give up. I won't give up my faith, my love for the team, my love for the sport, my RELIGION.

I just hope that we get through this phase without much pain and get back up and running... Go India!












Behavioral research

5 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck


I'm at the O'Reilly's Strata conference in NYC, where a NYU professor shared his research on consumer purchasing behavior. He asserts that 'What people "say" is different from what they actually "do"', so he recommends conducting behavioral studies by measuring what people "do", not what they "say". 

He used camera buying from Amazon as an example, where Zoom is talked about a lot as a feature but has lower impact on purchasing behavior. Whereas Battery Life and Megapixels are talked about less but have higher impact on purchasing behavior. 

Very interesting. 

We often conduct such analysis using surveys in enterprises. To do what he suggests will be more accurate but will cost more... it is much more costly to follow what people "do".


Looking to hire data scientists

6 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

We are a global management consulting firm and are looking for data scientists in our team in New York/Washington DC and Gurgaon/Chennai (India). There are full-time and internship (New York) opportunities. There are multiple positions including developing complex models in healthcare, pricing and optimization. If interested, drop me an email at prasoonsharma at gmail dot com.


Position specifications
- Public contributions (projects, plugins, blogs, open source etc.) on MATLAB, SAS, R, SPSS, Stata a plus
- Significant experience in economic and/or scientific programming. Ideally, experience in popular statistical softwares like MATLAB, SAS, R, SPSS, Stata
- Ability to use, analyze and visualize large data sets
- Demonstrated ability for conceptual analytics including translating design considerations into programmed code
- B.A. required, advanced degree preferred (M.A. or Ph.D. in Computer Science, Applied Science, Engineering, Economics, Statistics or similar)


Position responsibilities
Running and developing complex healthcare models; e.g., behavioral simulation. Specific responsibilities include:
- Maintaining and developing model and other analytic assets
- Driving individual and team problem solving regarding model architecture and continued development of model and related analytic assets (e.g., ability to independently develop hypotheses, approaches and solutions to development objectives)
- Conducting research and analyzing existing data sources to derive solutions and analyses for model development including complex, multi-variate analytical analyses
- Overseeing and writing communication materials supporting analytic materials, including communication decks for clients and for client teams
- Rigorously reviewing and testing of all programmed code to ensure accuracy/veracity of toolkit development
- Working with teams to understand, guide and refine client teams requests
- Developing and maintaining work plans etc.


A cricketer is born

7 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck



Rahul Dravid - a legend in Test Cricket

7 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

Rahul Dravid is a fantastic cricketer, and a role model for younger generation - focused, hardworking and humble.

Dravid recently became the 2nd highest scorer in Test Cricket (Sachin Tendulkar is the leading scorer). His contribution to Indian cricket is enormous and his nickname, "The Wall", is a testimony to his concentration, skill and will that has given India tremendous success in Test cricket. 

Dravid isn't worshiped in India like Tendulkar. He has played his cricket under Tendulkar's shadow. This isn't because his skill, will or contribution to Indian success is second to anyone. Its just that he lives in the same era as another great cricketer - Tendulkar.

Indian test cricket will never be the same when Dravid retires.





Learning SAS

8 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck


I want to learn the heavy-weight of Statistical softwares - SAS. It seems like the default choice for high-end statistics and I want to understand why.

I'm working in the healthcare practice in our firm and want to analyze claims and credit data (Terabytes, 50M+ records). Traditional ways (SQL) are limiting and desktop statistical softwares like R and Stata aren't suitable for such large data analysis. Other contenders (Matlab) don't seem to be in the same league.

So, its time to take a deep dive into SAS.



I'm looking for some advice to create a learning plan...

Good books


Good tutorials

  • I like video tutorials with examples e.g Statistics202.
  • I also like tutorials from a programmer's perspective better
  • Anything for SAS out there?


Good blogs


Experts

  • Will start exploring this. If you know of someone, please let me know. 

Good training courses in New York area

  • Preferably not the ones run by the company themselves. I'm looking for SAS experts who can run hands-on classes

SAS interest groups in New York area

  • I learn well in a study group. Any meetups?


Sachin Tendulkar's longevity

8 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

There have been over 3300 cricketers who've played Test and One Day cricket. The youngest player was 14 years old Hasan Raza from Pakistan who played 5 ODIs and 2 Test matches at that age. The oldest player was 52 years old W Rhodes from England, who played 8 Test matches at that age. But they are not the ones who've played the game for the longest time. Sachin Tendulkar has. 

Sachin started international cricket early and is still playing the game as a 38 year old (not many 38 year olds find a spot in international cricket these days). He has played a lot of Test and One Day cricket matches all these years, and has performed well consistently. As a result, he has broken all batting records in cricket. It is unlikely that any other cricketer will break Tendulkar's batting records anytime soon. 

Sachin's international cricket career spans 21 years and counting. What an athlete!

1) Tendulkar entered the game at the youngest age (debut at 16). Very few cricketers start their international career at that age. He is the 3rd youngest to play the game.


2) Tendulkar is now 38 years old and still strong. I wish he plays the game for at least another couple of years. The only other cricketers who comes close to his tenure are Javed Miandad from Pakistan (tenure 21 year, debut at 17 and retired at 38) and Sanath Jayasuriya (tenure 20 year, debut at 20 and retired at 40).



3) Tendulkar is now 38 years old and like his peers, he has reduced the number of games he plays. But unlike his peers he's still going strong. His peers seem to be headed towards retirement (see how Ponting's performance is dropping in Test and One day cricket - Runs scored and Scoring rate). 





Tendulkar is exceptional. He beats the rules (normal distribution for statisticians) and sits at the edge of all distributions - debut age,  tenure and performance. When he decides to retire, he will be on the edge of retirement age distribution curve as well.


Premature optimization is the root of all evil

9 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck


Premature optimization is the root of all evil
Donald Knuth


Cricket fever again

9 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck


England take on Sri Lanka in a test series starting today. 

After watching Cricket world cup 2011 and witnessing India play in quarter final, semi final and the finale in stadiums all over India, I had little appetite for any of the IPL 2011 matches. 

But now, I'm excited again. This test series will be a thrilling contest considering the two teams are more or less equal:
  • England is at #3 and Sri Lanka are at #4 in ICC test ranking
  • England plays well against Sri Lanka at home, and Sri Lanka has an edge in their home conditions. And the overall record is well balanced





Sri Lanka has won 2 of the last 3 test series against England with 1 draw. Clearly, they have a better track record. However, it is difficult to predict the winner as England has the home advantage and the momentum (Ashes tour and other tests in last 18 months). Sri Lanka has a new captain in Dilshan who will be under pressure. England will be under pressure too with the recent emergence of English cricket and the desire to prove their test cricket status. 



Let the games begin!


10 reasons why you should learn R

9 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck


10. Can't crack that hard Sudoku problem?? Use R!


9. Want to pick a skill that will give you an early adopter advantage?? Learn R! It is the leading open source statistical and data analysis programming language, and is heating up! 


8. Need to run statistical calculations in your software application?? Deploy R! It integrates with many programming languages like Java, Ruby, C++, Python


7. Looking for reusable libraries to solve a complex problem?? Get R! It has 2000+ free libraries to use in areas of finance, natural language processing, cluster analysis, optimization, prediction, high performance computing etc. 


6No Windows, No Doors - R runs on all the platforms. Just name it and you got it!! Windows PC, Mac, Linux to name a few


5Did you know how much fun stats can be- Try R!!


4Are you updated with the current trends?? Leading firms like NY Times, Google, Facebook, Bank of America, Pfizer, Merck are all using R, where are you??


3. Need to run your own analysis?? Need to solve an optimization problem?? Struggling with Excel or SQL in your model??..... just few statements away - Try R!! 


2. Want to create a compelling chart?? Try R! 


1. Want the coolest job in 2014?? Learn Statistics. It is the future. Data Scientists will be the sexy job in 2018


Vehicle Routing Problem

9 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

This is a follow-up to a previous question on VRP. I investigated R libraries and several other options to solve VRP and decided to build a custom desktop application using open source libraries from COIN-OR. Screenshots attached below.







Leave a comment if you're interested. I will contact you directly.


Team: Prasoon, Khaled, James


Introducing R in the Enterprise

10 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck



We've introduced R in the organization!

It is running along with the heavy weights of statistical analysis like SAS, SPSS, Matlab. Here's what we did and how we did it...




HOW DID IT START?
I started learning R last year and loved its simplicity and power. After using it primarily for personal projects, I came across a business problem in which R can be considered a good fit.

BUSINESS PROBLEM
The business need was to build a web-based tool for marketing budget optimization  - Marketing RoI (Return on Investments) i.e. how should a company that has multiple advertisement channels allocate its marketing budget across multiple channels to maximize profit or customer loyalty or customer life time value (LTV).

1) Input: The input to the analysis is the company's historical marketing budget allocation, profit, customer loyalty and LTV. 

2) Analysis: This analysis is done in 2 steps.
- Step 1) Our experts create a formula that relates the inputs given with RoI and LTV etc. It involves econometric techniques etc.
- Step 2) Optimization of the formula when the user conducts what-if analysis by varying total budget and/or spend across individual channels to see its effect on RoI and LTV. The desktop optimization model written in Excel using a commercial Excel plugin.

3) Output: Optimized spend across advertising channels and ability to evaluate multiple scenarios to determine optimum marketing mix

The initial version of the tool runs as an Excel model using a commercial Excel plugin. The business objective was to transform this Excel-based single-user application into multi-user web-based application.



TECHNICAL SOLUTION



A) Web application: The web forms needed to allow users to input data and run scenarios were simple. We develop web applications using Ruby on Rails on LAMP internally. Ruby on Rails gives us an agile environment to develop software by taking care of routine web application tasks like database connectivity. 

B) Optimization: Since, the Excel model uses a commercial plugin for step 2, the stakeholders started with the hypothesis of using the same commercial plugin's server version for optimization in the web application too.

For this we had to prove a couple of things:
1) Optimization of formula from step 1
2) Integration with web application

Option 1: Commercial optimization engine
We did a quick spike to test optimization with the commercial optimization plugin's server version and also its integration with Ruby on Rails web application and it was successful. We had to use JRuby to integrate Ruby with plugin's server edition as it provides only Java and .NET API.

Option 2: R (Open source)
In parallel, we checked if R can be used. R is a leading open source statistical environment.
- To solve the optimization problem in R we found a lot of R optimization packages and started testing packages like BB as the formula (from step 1) was non-linear, and had constraints and conditions. We tested BB's SPG function and also tried other generic algorithms. We got good optimization results from R (similar or better compared to commercial optimization engine).
- Now we had to check how to integrate R with our web application written in Ruby. We found a number of options like integrating R with Apache (rApache) or integrating R directly with Ruby (rsruby). We decided to use rsruby.

We ran a number of proof of concepts with R and shared results with stakeholders. The results were positive in terms of performance as well as the optimized results... So we got better results and that too for free! 



LESSONS LEARNED
Technical
  1. You need to be careful in running it in a shared environment, where it can use all your CPU and memory if it runs for long
  2. Don't forget to write unit tests using RUnit for your R code
  3. Capturing exceptions from R and dealing with them properly (appropriate message to users)
  4. rsruby installation documentation is good but needs a few tries depending on your Linux distribution
  5. rsruby does not run on Windows (wasn't a problem for us as we run our web applications on LAMP)
Process
  1. User acceptance testing: If you are transforming an Excel-based model into web-version, it is critical to have a fully working example of the Excel model to replicate it in R/other statistical packages
  2. Overcoming the challenges of using new open source software in enterprise: Like most enterprise IT shops, we are used to commercial software as well and the idea of using open source software to do serious work is limited to the most popular open source frameworks like Drupal, Ruby on Rails, Linux. We positioned R as an add-on to our LAMP environment and got a separate virtual server dedicated to it as it is memory hungry.


R exercise

10 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

Goal: Plot interesting charts from this year's Fortune Global 2000 list.

Download data in CSV format



Big data problems

10 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

I have big data problems.

I need to analyze 100s of millions of rows of data and tried hard for 2 weeks to see if I can use R for this. My assessment so far from the experiments...

1) R is best for data that fits a computer's RAM (so get more RAM if you can).

2) R can be used for datasets that don't fit into RAM using Bigmemory and ff packages. However, this technique works well for datasets less than 15 GB. This is in line with the excellent analysis done by Ryan. Another good tutorial for Bigmemory.

3) If we need to analyze datasets larger than 15 GB, then SAS, MapReduce and RDBMS :( seem like the only option as they store data on file system and access it as needed.

Since MapReduce implementations are clumsy and not business friendly yet, I wonder if its time to explore commercial analytics tools like SAS for big data analytics.

Can Stata, Matlab or RevolutionR analyse datasets in the range of 50 - 100GB effectively?


References
http://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/
http://www.austinacl.blogspot.com (image)


India-Pakistan World Cup Cricket 2011

11 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck





India and Pakistan have intense rivalry. They are siblings, who like each other deep down but fight often.

Their love isn't apparent. You see it in their appreciation of each other's culture, entertainers and sportsmen, and even politicians e.g. Indian movies and movie stars are popular in Pakistan and Benazir was popular in India. You also see their love when Indian and Pakistani people meet in a 3rd country, where their media and politicians don't brainwash them. Their they get to know each other really well. I'm from India and one of my best friends is from Pakistan.

Their fights on the other hand are much more evident. Their fights have translated into wars (1965, 1971, 1999), political battles (UN, US/USSR, local politics), movies (Border) and sport rivalries (Cricket, Hockey). We compete in sporting events with fanaticism and millions follow it closely.

One such battle took place in Cricket World Cup 2011. 

It started with Pakistan's inspiring performance in preliminary rounds and then them knocking out West Indies (the weakest team in quarter finals) in quarter finals easily.

India was expected to make it to final eight and did. India had to play Australia (4 time world champions) in quarter finals. Australian cricket isn't as strong anymore with many superstars retiring Gilchrist, Hayden, Warne and McGrath). In a thrilling contest, India beat Australia in Motera, Ahmedabad. Indian bowlers restricted Aussies to 260 and the batsmen scored 261.

This setup up a thrilling contest between India and Pakistan for a place in finals.


I was in India during Cricket World Cup 2011 and got to witness this battle in Mohali, Chandigarh. This is my travelog for this particular contest.


60 HOURS OF INDIA-PAKISTAN WORLD CUP CRICKET THRILL

Mar 29
- Baroda to Mumbai by air (12:30pm - 2pm)
- Pick brother from office at 6pm and head to the airport
- Dinner at Mumbai airport at 7pm while watching Sri Lanks vs. New Zealand semi-final match
- Mumbai to Delhi by air (9:30pm - 11:30pm)

Note: There were no direct flights from Baroda to Chandigarh, so I had to go to Mumbai. And then to Delhi from Mumbai as all direct flights from Mumbai to Chandigarh were sold out



Mar 30
- Delhi to Panipat by car (12 midnite to 2:30am)
- Dinner at dhaba (street restaurant) in Murthal - yummy food
- Slept for 2 hrs. Woke up at 5:30pm. Dressed in Indian team T-shirt and shorts. Grabbed my lucky Indian flag that I bought in Motera, Ahmedabad during India-Australia game
- Panipat to Chandigarh by car (6:30am to 10am)
- Collected tickets at 10:30am
- Drove to Mohali stadium by car
- Found entrance 1C (11am) after driving around the stadium twice (need better instructions on the road for stadium gates)
- Easy checkin into stadium (20 mins in general first-level security and 2 mins in 1C gate security)
- Sat in block A Pavillion Terrace on south side (11:30am). Good luck with seats: no direct sunlight, beautiful weather
- Watched preparations (wicket and field care, opening cermony preparations, team preparations)
- Toss@2pm
- India started batting at 2:30pm
- We were snacking, hydrating thru out the innings
- Our section was filled with top-brass from Chandigarh, Delhi, Mumbai. Beautiful girls from Delhi, Mumbai, and Chandigarh
Celebrity sighting: Arbaaz Khan was sitting in our section. Amir Khan, Preity Zinta, Indian PM Manmohan Singh and Indian businessmen were sitting in stands above us. Amir acknowledged the crowd, enjoyed the match thruout.


- Requested others with camera phones to click our pics and email it to us (eagerly awaiting those pics). I guess its time for me to upgrade my Blackberry and get one with a good camera.
- Duel with Pakistani fans sitting above us (noisy, beautiful girl, Afridi's brother, 4 adults, 4 - 6 kids). "Jeetega bhai jeetega India jeetega" would always overshadow "Jeetega bhai jeetega Pakistan jeetega".
- Good gesture from crowd: no booing, no Pakistan hai-hai, friendly, sporting, knowledgeable crowd
- Indian PM, Manmohan singh and Pak PM, Ghelani, met with the teams during opening. Everyone clapped. No one shouted anything bad! Indians love Sachin a lot and everything he does, so the focus was on him instead of the 2 head of states. Someone even shouted "Sachin for Prime Minister"
- India scored less than anticipated batting first (260 instead of 300+). Started well with Sehwag's blisterning (though short) knock but others could not capitalize. Even Tendulkar's inning wasn't impeccable (4 dropped catches)

Innings break

- Pakistanis start well but wickets keep falling afterwards to push them towards defeat
- Pakistani tails crumbles but not before giving the crowd a few anxious moments (Misbah's 6s)
- Exciting match, tense crowd. Anyone could've won till 90th over in the match!
- Sporting behavior on-field as well. No fights, no verbal duels. Nehra dismissed a close floored catch
- Great gestures off-field: India-Pakistan PMs; Players joking, laughing, shaking hands; Pakistan fan holding the Indian flag after Pakistan lost
- Match presentations: Sachin man of the match. Stadium acoustics suck. No one could hear the captains during the toss, interviews in the presentation ceremony (it was bad in Motera, Ahmedabad too). Punjab Cricket Association please fix it
- Peaceful exit at 11:30pm, decent, helpful crowd. Found car in 10 mins (quite amazing given the number of people and cars)



Mar 31
- Chandigarh to Panipat by car (12 midnite)
- Dinner@dhaba (line hotel) 2am - 2:40am. Yummy tandoori parathe (hail Punjab and Haryana for such great places to eat all along the road)
- Reached Panipat at 3:30am and crashed for 7 hours
- Panipat to Delhi by car (11:30am to 1:45pm)
- Remebered and thanked god for bringing us back safely. Car driving in Delhi, Punjab, Haryana highways is no joke. Its like playing a video game - you play chicken all the time, you expect others to brake for you, lanes have no meaning. If you're tempted to drive yourself, I would advise against it. These roads, laws (or lack thereof) are best known and handled by seasoned drivers in these conditions.
- Watched highlights of the game at the airport (thx Hyundai for sponsoring big screens)
- Delhi to Mumbai by air (3pm to 5pm)
- Mumbai airport to home by car (5:30pm - 6:30pm)



Next up, is watching the finale between India and Sri Lanka at Wankhede stadium in Mumbai.


A quick look at Cricket World Cup 2011

11 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck


Preliminary rounds are over. Top 8 teams have qualified to quarter finals. No surprises there. No one doubted Australia, England, India, New Zealand, Pakistan, South Africa, Sri Lanka and West Indies to miss the cut.

So what's next?

A lot of action. Upcoming games are must-watch as the minnows are gone and now top teams battle in knock-out rounds and results could be surprising. So find a good excuse, a comfortable couch and your best buddies to watch the teams fight it out. There’s a lot of cricket played these days. This is the cup that matters.



Here are the players and teams to watch...

BATSMEN
  • AB de Villiers: At his peak performance since ICC Batsman of the Year award in 2009.
  • Sehwag is fired-up and is dangerous when he spends more time in the middle than in the dressing room.
  • Tendulkar has already scored two excellent centuries and should score his 100th ton in ODI and Tests during this World Cup.
  • Sangakkara is slow but steady. Not flamboyant but successful. Watch out for his contributions.
Aussie batsmen haven't scored big yet and their form remains a worry. Top Sri Lankan, Indian and South African batsmen have all found some form and are likely to score big in upcoming matches.



Indian and West Indian tails have been frail and collapsed often. Get ready for a world cup that might be decided by how the tail-enders use their bats.



BOWLERS: Spinners lead the bowling chart, with Afridi leading the pack. He’s the new Kumble. Aussies are missing their lethal bowling attack but Brett Lee is playing his last world cup like a champion.




ALL ROUNDERS: All rounder performance has been key to many world cups.
  • Viv Richards in 1979 for West Indies' victory
  • Mohinder Amarnath, Kapil Dev and Madan Lal in 1983 for India's victory
  • Steve Waugh in 1987 for Australia's victory
  • Imran Khan in 1992 for Pakistan's victory
  • Jayasuriya and Arvind de Silva in 1996 for Sri Lanka's victory
World Cups in 1999, 2003 and 2007 were decided by Australia's strong batting and bowling performance. Their batting (Ponting, Gilchrist, Waugh brothers, Bevan, Hayden) and bowling (McGrath, Warne, Lee) was superior compared to everyone else. 

In this World Cup, Yuvraj decided to prove those wrong who claimed that India is going into the World Cup without a genuine all rounder. His all round performance has been key in a couple of victories already. Kallis, the best all rounder in current ODI era, hasn’t wowed yet. Australia is known for great all rounders but notice the lack of all round performance from Aussies in this world cup (no yellow in chart below).





TEAMS

It is difficult to predict the winner as the top 4 teams (Australia, India, South Africa, Sri Lanka) are more or less equal, as seen in preliminary rounds where no team was invincible and no team dominated completely. So get ready to cheer a surprise winner. Its not who you think it is!

This world cup will be decided by all rounders and tail-end batsmen (to bat 50 overs). Batting and bowling strengths of top teams are more or less equal. Some have stars like Dale Steyn or Brett Lee while others have good pairs like Zaheer Khan and Harbhajan Singh.


Why learn R?

12 months ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

I'm introducing R to a few colleagues this week and want to share why learning a software like R is important... Here are a few articles that explain it well... Other reasons?

Importance of data science
- Couple years ago Google's Chief Economist Hal Varian said that the sexy job in the next ten years will be statisticians. Read the full article (requires registration)
The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it's going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.
I think statisticians are part of it, but it's just a part. You also want to be able to visualize the data, communicate the data, and utilize it effectively. But I do think those skills - of being able to access, understand, and communicate the insights you get from data analysis - are going to be extremely important. Managers need to be able to access and understand the data themselves.
- Rise of data scientists

- Becoming a data scientist

- Essential skills for a data scientist

Where R fits?
R provides an environment for all tools needed for data science (see the data science process below from Benjamin Fry's thesis).




- R is ideal for small data analysis i.e. data that fits in a computer's RAM e.g. data < 10GB. Whereas SQL and search techniques seem good for larger data sets that can fit in one machine and techniques like Hadoop are good for BIG data sets that cannot fit in one machine.

- NY times article on R you ready for R?

- NY times article on R

- R is becoming popular


Stata or R - How to create dynamic variables in R?

about 1 year ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

As we dig deeper into Stata or R debate, a few questions have come up.


Question 1: One of the things Stata does well is the way it constructs new variables (see example below). How to do this in R?



We can rewrite it as-is using for loops in R, which is slow and not elegant. What's the elegant way to write this in R? I haven't used plyr yet... Time to learn it?

Link to question on StackOverflow


Stata or R

about 1 year ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

Recently I came across a complex model written in Access with complex SQL queries all over the place. The engineer who was maintaining it and I did some analysis and agreed that the model was using SQL in an unnatural way (things SQL isn't good at) - complex logic, formatting etc. 

We agreed to use SQL and a more powerful programming language to re-build the model. The engineer is familiar with Stata, so he quickly wrote Stata code. When I looked at the Stata code, it looked fairly easy to reproduce it in R. I've posted some R commands for the Stata commands I found in that code. 

What are the advantages of using Stata? Why shouldn't I use R for this?



EPL Fantasy Football: Best overall, home and away teams

about 1 year ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

I've refined the R code to pick the best fantasy soccer team by using more granular player performance data (available publicly). Here are the best overall, home and away teams. 





The constraints used are: 

1) Number of goalkeepers = 1
2) Number of defenders = 4
3) Number of mid fielders = 3
4) Number of strikers = 3
5) Total team cost = 50 GBP
6) Maximum number of players from a team = 2 (most fantasy soccer sites use this)



Batting and Bowling performance in Ashes 2010 - 2011

about 1 year ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

English cricket is strong once again. And it is great to see that (after all they invented the gentleman's game).

In a sharp contrast to previous tours of Australia, England outplayed Australia on their home ground in the recently concluded Ashes 2010-11 series (see performance comparison of 2010-11 and 2006-07 series below).

English wins in 2010-11 were convincing and Aussies must be heart broken. This series loss might be the last nail in Ricky Ponting's coffin. His leadership and batting woes continue... Ponting's performance in this series was similar to Collingwood's, who decided to retire from Test cricket. Will Ponting do the same?

Ponting was the best batsman in 2006-07 with an average of 64, and this year Cook is the best batsman with an average of twice that! (see batting charts below). Hats off to Cook's performance. His elegance and patience is a blessing for English cricket.

In Chris Tremlett, England has found a fantastic bowler. England didn't miss a heart beat with Stuart Broad's injury. Tremlett, Anderson and Bresnan more than filled that gap by bowling exceptionally well. England and South Africa clearly have the best fast bowling units now.  

I'm looking forward to India's tour to England this summer. 



Compare 2010-11 Ashes series with England's previous Ashes tour to Australia in 2006-07. In 2010-11, English cricketers (pink) dominated the top-right corners of batting/bowling performance charts, whereas in 2006-07 Australia (yellow) dominated the top-right section of these charts.









World Bank data plots - Take 2

about 1 year ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

A few months ago, I created World Bank plots and compiled the images on Flickr to share it online. Recently I came across Mark's post to create animated images in R and it inspired me to re-create my World Bank plots as animated images. With this technique I was able to group the plots in a category together as one animated image instead of multiple images. I love this solution to create animated images of our R plot -- simple and elegant.


Awesome day for Cricket fans

about 1 year ago | Prasoon Sharma: Enterprise Software Does not Have to Suck


What a treat for Cricket lovers tonight!

Australia and England are playing the second test match of the Ashes 2010 series today at 9:30pm EST and later tonight (3:30am EST), South Africa takes on India in a battle for test cricket supremacy.

All this means no sleep for me!


Here's how these teams have performed against each other so far...






Fantasy football (oops, soccer)

about 1 year ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

Recently a colleague asked if I could use R/statistics to form a dream soccer team from a pool of soccer players, given basic player information like name, club, cost, points.

The idea is to form a team with your preferred configuration of number of defenders, mid fielders and strikers to maximize your total team points without exceeding your budget.

I wrote some R code (linear optimization) to get the answer. Check it out. The code allows configurable constraints to let you create your own dream team. 
1) Number of goalkeepers
2) Number of defenders
3) Number of mid fielders
4) Number of strikers
5) Total team cost ($$$ you'll spend on this team)
6) Maximum number of players from a team

Here's a team I put together based on this code with following configuration.

1) Number of goalkeepers = 1
2) Number of defenders = 4
3) Number of mid fielders = 3
4) Number of strikers = 3
5) Total team cost = 50 GBP
6) Maximum number of players from a team = 4




DISCLAIMER: Soccer isn't my favorite sport (its Cricket, if you're curious ;) and I don't play fantasy football.



Which factors impact software professional compensation?

about 1 year ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

My notes... Anything else?

  • Geography (offshore vs. onshore)
  • Skill requirement for job: commodity (Java) vs. specialized (ERP, Agile/XP)
  • Role type (PM/BA/Dev/Arch/Tester)
  • Experience/proficiency (years/certifications)
  • Relevance to primary business line (bottom line) e.g. enterprise IT vs. companies where software is primary business - Apple/Microsoft/Oracle/SAP/etc.
  • Industry (e.g. Financial services pay more)
  • Breadth of skills (# technologies, what else do you bring to the table)
  • Working conditions (allowed to share learnings outside or not, Flexibility: Work from home, etc.)
  • Quality/Attractiveness of work type/brand


Learning R: Day 7

about 1 year ago | Prasoon Sharma: Enterprise Software Does not Have to Suck

a) Plot your first chart in R
- dot and line chart
- bar chart
- bubble chart