Will Democratizing AI be The Cancer at The Heart of Future Enterprise?

I’m I alone in thinking that Microsoft’s (and others) push to “Democratize AI” represents a threat to business equal to that of, say, the Y2K Bug?

Here’s why I think it is. A while ago I was attending a conference and I saw an AzureML talk. The presenter did an amazing job, they were really engaging, the talk was deep, technical, and… utterly flawed. During the talk the presenter took Lickart Scale data and clustered it using the K-Means algorithm. The audience were enthralled.

Now for the non data scientists in the audience, here’s a quick catch up. You’ll see Lickart Scales used most often in surveys, you know the kinds of questions that ask: “Where 1 is strongly disagree, 2 is disagree, 3 is neither agree nor disagree, 4 is agree and 5 is strongly agree, how strongly do you agree with the following statement…”. This makes it ordinal data, data which is categorical in nature.

However, because the answers are recorded numerically – 1 through 5 – when the data comes to be analysed it looks like interval data (numbers that you can do arithmetic with). Now, when that data is analysed by an experienced data scientist, then there’s no problem, they recognize the trap and apply other analysis techniques, such as Pearson’s Correlation.

The presenter however, although a very experience database person, had little to no experience as a data scientist and immediately fell into the trap of either, seeing numbers and assuming the data was interval in nature, or not knowing that, in the K-Means algorithm, the “Means” part relates to averages. The data being ordinal in nature meant that the presenter was not entitled to do arthritic on it.

Now this wouldn’t be a real problem if it were not for two issues. Firstly, there’s the issue that the very nature of “Democratizing AI” means that the majority of analysis is going to be carried out by non trained professionals in the future and secondly, tools like AzureML can’t tell that what you are doing is nonsense because they don’t have the context of how the data was captured. You feed them numbers and ask them to do arithmetic, they do that arithmetic and give you an answer.

Going forward, the danger will be that untrained people will take that answer and run with it, not having the training to realize that the answer is utterly meaningless in the context of the problem.

To solve this problem enterprises need to understand that no matter how simple Microsoft make data science appear, to operate it safely you need trained staff. Also, Microsoft and other companies need to think about how they can make these tools “safer”, something like compilers and linters for data science need to be developed.

The bottom line is, “Democratizing AI” should mean LESS training is required to use them, not that NO training is required.

Posted in Data Science | Leave a comment

Yesterday in Data Science March 12th 2017

Following my post about logistic regressions, Ryan got in touch about one bit of building logistic regressions models that I didnÔÇÖt cover in much detail ÔÇô interpreting┬áregression coefficients. This post will hopefully help Ryan (and others) out. @SteffLocke This was
The post How to go about interpreting regression cofficients appeared first on Locke Data. Locke Data are a data science consultancy aimed at helping organisations get ready and get started with data science.
More details at… http://feedproxy.google.com/~r/RBloggers/~3/r6eQu42S844/

Focus for books on R tend to be highly focused on either statisticians or programmers. There is a dearth of material to assist those in typically less quantitative field access the powerful tools in the R ecosystem. Enter Text Analysis with R for Students of Literature. I haven’t done a deep read of the book, [ÔǪ]
More details at… http://feedproxy.google.com/~r/RBloggers/~3/t7GZ9GZ46A4/

Recently, I read a post regarding a sentiment analysis of Mr Warren Buffetts annual shareholder letters in the past 40 years written by Michael Toth. In this post, only five of the annual shareholder letters showed negative net sentiment scores, whereas a majority of the letters (88%) displayed a positive net sentiment score. Toth noted []Related PostUsing MongoDB with RFinding Optimal Number of ClustersAnalyzing the first Presidential DebateGoodReads: Machine Learning (Part 3)Machine Learning for Drug Adverse Event Discovery
More details at… http://feedproxy.google.com/~r/RBloggers/~3/xa89N8oIGxk/

There’s a handy new function in R 3.4.0 for anyone interested in data about CRAN packages. It’s not documented, but it’s pretty simple: tools::CRAN_package_db() returns a data frame with one row for every package on CRAN and 65 columns of data on those packages, as shown below. > names(tools::CRAN_package_db()) [1] “Package” “Version” “Priority” [4] “Depends” “Imports” “LinkingTo” [7] “Suggests” “Enhances” “License” [10] “License_is_FOSS” “License_restricts_use” “OS_type” [13] “Archs” “MD5sum” “NeedsCompilation” [16] “Additional_repositories” “Author” “Authors@R” [19] “Biarch” “BugReports” “BuildKeepEmpty” [22] “BuildManual” “BuildResaveData” “BuildVignettes” [25] “Built” “ByteCompile” “Classification/ACM” [28] “Classification/ACM-2012” “Classification/JEL” “Classification/MSC” [31] “Classification/MSC-2010” “Collate” “Collate.unix” [34] “Collate.windows” “Contact” “Copyright” [37] “Date” “Description”…
More details at… http://feedproxy.google.com/~r/RBloggers/~3/Qknl1yY37PE/

This is part of a new series of articles: once or twice a month, we post previous articles that were very popular when first published. These articles are at least 6 month old but no more than 12 month old. The previous digest in this series was posted here a while back. 
20 Great Blogs Posted in the last 12
More details at… http://www.datasciencecentral.com/xn/detail/6448529:BlogPost:561994

Posted in Data Science Digest | Leave a comment

Yesterday in Azure Cloud – March 12th 2017

The built-in geo-replication feature has been generally available to SQL Database customers since 2014. During this time one of the most common customer requests has been about supporting transparent failover with automatic activation. Today we are happy to announce a public preview of auto-failover groups that extends geo-replication with the following additional capabilities:
More details at… https://azure.microsoft.com/blog/azure-sql-database-now-supports-transparent-geographic-failover-of-multiple-databases-featuring-automatic-activation/

Automatically scaling out or scaling in applications to handle the demands of your business is an essential element of the cloud strategy. AzureÔÇÖs Autoscale service empowers you to automatically scale your compute and App Service workloads based on user-defined rules regarding metric conditions, time/date schedules, or both.
More details at… https://azure.microsoft.com/blog/manage-your-business-needs-with-new-enhancements-in-azure-autoscale/

Broad support for regulatory compliance and ongoing innovation are at the core of Microsofts commitment to enabling U.S. government missions with a complete, trusted, and secure cloud platform.
More details at… https://azure.microsoft.com/blog/azure-government-the-most-secure-compliant-cloud-for-defense-with-new-compliance-and-service-offerings/

App Service on Linux (Preview) enables developers to run their cloud apps apps natively on Linux Docker Containers. It makes it easier to migrate existing apps hosted on a Linux platform elsewhere
More details at… https://azure.microsoft.com/blog/see-whats-new-for-azure-app-service-on-linux-preview/

Application Insights has new tools to empower your development team to better understand how customers use your web apps. These tools are available as a preview today in Application Insights in the
More details at… https://azure.microsoft.com/blog/new-tools-for-understanding-user-behavior-with-application-insights/

Azure DevTest Labs is a commercial Azure service that enables IT admins to create a cost-controlled self-service for developers and testers to quickly create environments in Azure, while minimizing waste and optimizing cost. We announced the service GA last May, and never stop exploring more opportunities to build solutions that solve our customersÔÇÖ real problems in various scenarios. Today, as Microsoft Build 2017 happening now in Seattle, I would like to take this moment with you to look back all the key functionalities we’ve shipped since Connect() conference last November, and explain how they can help you in various scenarios.
More details at… https://azure.microsoft.com/blog/azure-devtest-labs-updates-at-build-2017/

We are excited to announce the general availabilty of Application Insights Profiler for Azure App Service.
More details at… https://azure.microsoft.com/blog/application-insights-profiler/

Posted in Azure | Leave a comment

Microsoft is Building Literate Machines

…now, the company’s leading AI experts are working on systems that can do something even more complex: Read passages of text and answer questions about them.


Posted in Uncategorized | 3 Comments

10 great GitHub repositories focusing on IPython, TensorFlow and Theano

This is a collection of 10 great GitHub repositories focusing on IPython, TensorFlow, Theano and related topics, for data scientists. The last one is not on GitHub.


Posted in Uncategorized | Leave a comment

Crimson Hexagon uses social media to predict stock movements

LONDON – You may not have heard of Crimson Hexagon, but the chances are it’s heard of you.


Posted in Uncategorized | Leave a comment

Global Fishing Watch catches illegal fishing vessel

The island nation of Kiribati suspected that Marshalls 203 had violated its recently created no-fishing zone, but it didn’t have sufficient proof. That’s where Global Fishing Watch came in.


Posted in Uncategorized | Leave a comment