Speaking at DDD South West

dddsw_medium.jpgI’m delighted to say I’ve been accepted to speak at DDD South West, where I’ll be delivering my talk on “Hadoop and Big Data for Microsoft Developers”, I hope I’ll so you there; if so, don’t forget to stop and say “hi!”.

Posted in C#, Community, Data Science, Developement, Hadoop, HDInsight | Tagged , , , , , , | Leave a comment

The Day HDInsight Died (Again)

So here I am this morning, all ready to get my Data Science Fu on, I pop onto my local HDInsight installation, and it tells me:

Hadoop Not Running

Hmm, time to see if the services are running:

Hadoop Services Not Running

Nope, all stopped. Sigh, right, start them. When I try to do that, I get an error that tells me that there’s been a “log on failure”. A what now? Same log on worked yesterday. Hmm, worked yesterday but not today, I wonder if HDInsight installs a default user with the password set to expire?

Hadoop User Must Change Password Next Logon

Yes it does! I’m not sure if HDInsight installs this way, or if some policy applied on my work machine forces this, but unchecking the “User must change password at next logon” and checking “Password never expires”, should solve this problem.

Services Started

Services are up and running again…

Hadoop Is Working

And HDInsight is back!

So, if your local instance of HDInsight stops working, then check that the hadoop user password hasn’t expired.

Posted in HDInsight | Tagged , , | Leave a comment

How Alex Salmond Answers The Hard Questions on Scottish Independence

As you’ll know, at least hopefully you will, should you vote yes on September 18th, in the Independence Referendum, you won’t be voting for independence per se; what you’ll be voting for is to give Alex Salmond the right to negotiate the terms under which Scotland will eventually become independent.

Alex, has set out his position on each of the major areas under negotiation, this is the right and proper thing to do, and it is to be welcomed. However, what he hasn’t done is to set out a “Plan B” if you will, or informed us what his “Red Lines” would be.

What do I mean by that? Well let’s look at the need for a “Plan B”. First, let’s take the issue of Europe. An independent Scotland will have to negotiate entry into the EU. Alex’s position is that we’ll be accepted and the process will be quick. Recently the EU President, a man in a position to have some idea about what it takes to expand the EU membership, said that, at best, the negotiations would be long and complicated.

Given that, when questioned on what would happen if Scotland were refused entry, or even what would happen to our trade if EU membership took, say, 5 years to complete; Alex should be able to say, “Well if my negotiating position isn’t entirely successful, or if I lose completely, an Independent Scotland would…”. Instead, he waves his hands and claims all will be well, much like the video.

Having dealt with that, let’s look at what I mean by “Red Lines”. To do that, let’s look at the situation with the currency. First Alex claimed that the pound was a “millstone” around the necks of the Scottish people, and he couldn’t wait to rid us of it. Once focus groups told him that ditching the pound was a barrier to people voting yes, that idea was immediately scrapped and we were told we’d be keeping the pound. Notice, as with all these things, Alex never says, this is my negotiating position, he just asserts that, “we will…”, like it’s a foregone conclusion.

However, the other political parties have stated that you can’t have a currency union without a political union, and since the SNP are not interested in the latter then an independent Scotland can’t have the former. By the way, despite what the SNP say, this is not the other parties bullying us Scots, it’s a principle followed by the EU too, if you want to be part of the Eurozone, then you have to be part of the political union, since we know Alex wanted to join the Eurozone not that long ago, we know he’s not against a political union per se, just with the rest of the UK it seems.

So, Salmond’s position is we’ll have a currency union, the remainder of the UK’s position is we won’t, so what’s Salmond’s alternative, should his negotiating position fail? There isn’t one, it’s more hand waving a la the video and more assertion of “There will be a currency union”. That brings me back to the idea of a “Red Line”, if we can’t be in a currency union with the UK, and we can’t join the Euro, even supposing we wanted to, immediately, would that constitute a “Red Line”, a situation whereby Salmond would say, “to carry on under these circumstances would be so damaging to Scotland that we won’t continue with Independence”? I fear not, in fact my fear is that there are no “Red Lines”, my fear is that Salmond wants independence at any cost, and since he won’t tell us what, if anything, his “Red Lines” are, I fear I’m right.

So, before I go, just remember this. If you are voting yes in September, you are betting that Salmond wins every argument, and the result of every negotiation is that he gets his own way. When was the last time any politician achieved that? Hell, when was the last day you won every argument you took part in? The sad truth, that no one seems to be waking up to, is that if you vote yes in September, you have no idea what you’ll be getting, so be careful what you wish for.

I’m voting no, only because “hell no!” isn’t an option.

Video | Posted on by | Leave a comment

The Day HDInsight Died

So there I was, casually working away on my HDInsight sessions for a conference next week. Happily submitting my jars and watching my map reduce jobs running, like a happy little data scientist. I had just finished writing my umpteenth job and I wanted to submit it, so I clicked to open the HDInsight dashboard; but it failed to open, reporting that there was nothing on the end of localhost:8085. Hmm, odd. so I opened the job tracker: dead too, and so was the namenode.

I opened the services manager under Windows and saw that, although all the Hadoop services were marked as automatic they were all stopped. Weird, but okay. I clicked to start the first one and it failed to start, reporting a logon failure. “Well that’s nonsense” thinks I, but I’ve not got time to fix that, I’ll just reinstall.

So I uninstalled HDInsight dev preview and the HDP, then fired up Web Platform Installer and reinstalled it. Job done. Well, not quite, because although the namenode and jobtracker and the command line shortcuts installed okay the dashboard was missing. Since I knew the URL I just added it and click to launch it:


But no, the dashboard hadn’t been installed. Weird, but not to worry, so I Googled it with Bing, or I might have Binged it with Google, I can’t really remember, but the upshot was that I found this post:


Which basically said I had to had go here:


and hack the package. So I went there and low!


The package isn’t there. So, that’ll explain why it didn’t install, but doesn’t explain why it hasn’t been downloaded, nor why HDInsight suddenly stopped working in the first place, after having been happily mapping and reducing it’s little heart out for weeks.

If I get to the bottom of it I’ll let you know.

Posted in HDInsight | 5 Comments

7 Important Data Science Papers

Originally posted on Data Science 101:

It is back-to-school time, and here are some papers to keep you busy this school year. All the papers are free. This list is far from exhaustive, but these are some important papers in data science and big data.

Google Search

  • PageRank – This is the paper that explains the algorithm behind Google search.


  • MapReduce – This paper explains a programming model for processing large datasets. In particular, it is the programming model used in hadoop.
  • Google File System – Part of hadoop is HDFS. HDFS is an open-source version of the distributed file system explained in this paper.


These are 2 of the papers that drove/started the NoSQL debate. Each paper describes a different type of storage system intended to be massively scabable.

Machine Learning

View original 66 more words

Posted in Uncategorized | Leave a comment

Statistics, Intuition and Monty Hall

One of the things that makes data science hard is that it has a foundation in statistics, and one of the things that makes statistics hard is that it can run counter-intuitively. A great illustration of that is the Monty Hall Problem. Month Hall was a US game show host who presented a show called “Let’s Make a Deal” and the Monty Hall Problem is modelled on that show; it goes something like this

Posted in Data Science | Tagged , , | Leave a comment

Hortonworks VM on Hyper-V Under Windows 8

Every data scientist is going to have to work with large data files at some point in their careers, and right now, the de facto standard for doing so is Hadoop. There are lots of ways to gain access to Hadoop, from complete “roll your own” solutions, right up to pre packaged and ready to go solutions from people like Hortonworks

Link | Posted on by | Tagged , , , | 1 Comment