The Need for Machine Learning is Everywhere!

In my work, which is predominantly information technology, the need for Machine Learning is everywhere. And I don’t mean just in the somewhat obvious ways like security or log file analysis. Consider, my work experience goes back to the tail end of the mainframe era. In fact one of my first jobs was networking together what were at the time considered powerful desktop computers as part of a pilot project to replace a mainframe. Since then

Continue reading →

Big Data Spain Presentation

I was invited to give a talk at Big Data Spain in Madrid, Spain on November 18. If you are keeping track, I was in Barcelona at the previous day, and even worse I hadn’t slept more than 4 hours in the previous 3 days between traveling, presenting, and preparing to present. Still, other than the power going out for a few minutes, the talk went very well.

Thanks to the Big Data Spain team which put together a great conference and an excellent venue. 2014 Presentation

I had the opportunity to present a demo of BigML at the first Predictive APIs conference in Barcelona, Spain. I was the very first speaker on the very first day and other than a few problems with the WiFi dropping out on me and the difficulty reading the terminal during the API demo, everything went well. There were some other great talks during the conference as well, the videos for which are also up on

Data Science Melbourne Meetup

After we launched BigML in Australia, we spent two weeks in Melbourne touring, giving demos, and talking to potential customers. One of the great venues I was invited to was the Data Science Melbourne meetup. They recorded the talk and made it available on youtube. The poor audio quality was my fault – I have to talk with my hands (as you will see) and couldn’t handle the microphone!

A big thanks to Phil Brierley for organizing this event and to GCS Agile, our partner in Australia, for coordinating everything.


BigML Late Summer 2014 Webinar – Anomaly Detection!

Things at were busier than ever in 2014, and by mid-year we had already introduced a second machine learning algorithm with our new Anomaly Detector, based on Isolation Forests. Of course there was also a bunch of other new features as well:

– Model Clusters
– Missing Splits
– Anomaly Detector
… And a tease for a few upcoming things:

– Sample Server
– Dynamic Scatterplot
– Projects

The Value of Things (VoT): MassTLC IoT Conference Panel

I was invited to speak on a panel at the MassTLC VoT conference in June. The topic of the panel was “Analyzing data to get actionable intelligence” which was a perfect opportunity to discuss the applicability of Machine Learning to the analysis of the data deluge that the Internet of Things will likely bring.

It was a pleasure to meet the other members of the panel, and the discussion was lively and informative. A video has been promised but hasn’t surfaced yet. I will post it when it is available.

I did come across an article that touched on the panel and includes the following quote:

— excerpt —

Still, similar to the initial buzz around big data, IoT discussions evoke excitement about the wonderful possibilities: new business models! Competitive advantage! Deeper insights! And they often leave out what’s practical, as Poul Petersen, chief infrastructure officer for Corvallis, Oregon-based BigML Inc., noted during a panel discussion on data analytics. Attaching sensors to every grapevine, cargo ship, train car or transformer, “that’s not too hard,” he said. But “how on earth do you get to that last step?”

By last step, Petersen means how you mine sensor data to find what he called those “aha moments,” or correlations between two seemingly unrelated data points. Sensor data is “big in terms of complexity,” giving businesses millions of data points to dig through. “You don’t know at the outset if two things are related,” he said. “Or you may just get it wrong.” It should be noted that BigML is in the business of helping companies make the leap into advanced analytics to find those correlations, but still, Petersen’s perspective was more than an on-message advertisement. As CIOs know from forays into other types of big data, rich insights don’t just fall out of the data—not even for a data scientist.

— excerpt —

BigML Spring 2014 Webinar – Clustering!

This webinar introduced BigML’s K-means clustering algorithm, which was our first unsupervised learning algorithm. I worked hard on this webinar to come up with several easy-to-demonstrate applications of clustering. One of the coolest thing though is our Model Clusters feature which is a great way to “discover” the rules that describe each individual group in the cluster.

– Item Discovery
– Customer Segmentation
– Active Learning

BigML API Webinar Mar 2014

This is the API webinar are promised at the end of the previous webinar when I was talking about “Programmatic ML”. I think it’s really important that even though BigML has such an excellent User Interface, BigML is an API first company; we export the same API at that we use internally for our own UI.

I went thru a pretty detailed intro into the API and then showed some examples of using the API:

Predictive Application
Python Bindings

BigML Webinar, January 28, 2014: Winter 2014 Release

This is the second BigML webinar – I got a little more technical with this one and used real data from Prosper. It’s pretty amazing the pace at which we implement new features; it had only been a few months since the last webinar and we brought:

Dataset filtering
Training weights
Adding dataset fields including Flatline
Node Threshold
Batch Predictions
In-memory trees
… and of course the Development free tier