Data & AI Digest for 2018-09-05

Blue Bikes is a bicycle sharing system in the Boston, Massachusetts. The bikes sharing program started on 28 July 2011. This program aimed for individuals to use it for short-term basis for a price. The program allows individuals to borrow a bike from a dock station and retrun to another dock station after using it. […]
Related Post
Analysis of Los Angeles Crime with R
Mapping the Prevalence of Alzheimer Disease Mortality in the USA
Animating the Goals of the World Cup: Comparing the old vs. new gganimate and tweenr API
Machine Learning Results in R: one plot to rule them all! (Part 1 – Classification Models)
Seaborn Categorical Plots in Python

CategoriesVisualizing Data
Tags
Data Visualisation
ggplot2
Maps
R Programming
Tips & Tricks
More details at…

You might have read my blog post analyzing the social weather of
rOpenSci
onboarding,
based on a text analysis of GitHub issues. I extracted text out of
Markdown-formatted threads with regular expressions. I basically
hammered away at the issues using tools I was familiar with until it
worked! Now I know there’s a much better and cleaner way, that I’ll
present in this note. Read on if you want to extract insights about
text, code, links, etc. from R Markdown reports, Hugo website sources,
GitHub issues… without writing messy and smelly code!

Introduction to Markdown rendering and parsing

This note will appear to you, dear reader, as an html page, either here
on ropensci.org or on R-Bloggers, but I’m writing it as an R Markdown
document, using Markdown syntax. I’ll knit it to Markdown and then
Hugo’s Markdown processor,
Blackfriday, will transform
it to html. Elements such as # blabla thus get transformed to
blabla. Awesome!

The rendering of Markdown to html or XML can also be used as a way to
parse it, which is what the spelling package does in order to
identify text
segments
of R Markdown files, before spell checking them only, not code. I had an
aha moment when seeing this spelling strategy: why did I ever use
regex to parse Markdown for text analysis?! Transforming it to XML
first, and then using XPath, would be much cleaner!

As a side-note, realizing how to simplify my old code made me think of
Jenny Bryan’s inspiring useR! keynote talk about code
smells. I asked her
whether code full of regular expressions instead of dedicated parsing
tools was a code smell, sadly it doesn’t have a specific name, but she
confirmed my feeling that not using dedicated purpose-built tools
might mean you’ll end up “re-inventing all of that logic yourself, in
hacky way.”. If you have code falling under the definition below, maybe
try to re-factor and if needed get
help.

It’s that feeling when you want to do something that sounds simple but
instead your code is like 10 stack overflow snippets slapped together
that you could never explain to another human what they do 😰
pic.twitter.com/IF53AX6QvC

— Dr. Alison Hill (@apreshill)
31
d’agost de 2018

From Markdown to XML

In this note I’ll use my local fork of rOpenSci’s website source, and
use all the Markdown sources of blog posts as example data. The chunk
below is therefore not portable, sorry about that.

roblog %
commonmark::markdown_xml(extensions = TRUE) %__%
xml2::read_xml()
}

See what it gives me for one post.

get_one_xml(all_posts[42])

## {xml_document}
##
## [1] \n We just released a new version of \n __ …
## [2] \n First, install and load taxize\ …
## [3] install.packages(“rgbif”)\n
## [4] library(taxize)\n
## [5] \n New things\n
## [6] \n New functions: class2tree\n
More details at…

If you want to do statistical analysis or machine learning with data in SQL Server, you can of course extract the data from SQL Server and then analyze it in R or Python. But a better way is to run R or Python within the database, using Microsoft ML Services in SQL Server 2017. Why? It’s faster. Not only to you get to use the SQL Server instance (which is likely to be faster than your local machine), but it also means you no longer have to transport the data over a network, which is likely to be the biggest…
More details at…

Week 1 Gold Mining and Fantasy Football Projection Roundup now available. Go get that free agent gold!
The post Gold-Mining W1 (2018) appeared first on Fantasy Football Analytics.
More details at…

Advertisements
This entry was posted in Data & AI Digest and tagged . Bookmark the permalink.