2014 Boston Fall Data Festival
November 39, 2014

Boston’s second annual 2014 Data Festival brings together the meetup community, entrepreneurs, VCs and others to highlight our data-centric scene. Metro Boston is wonderfully diverse, with some of the best minds, universities, and companies globally.

Program Schedule

Monday November 3

06:00 PM Boston Data Festival 2014 Kickoff @ Thomson Reuters

The Boston Data Festival will get off to a festive start with an evening of networking and talks. We will be providing food and drinks at our registration and networking session where you can mingle and network with fellow festival speakers, data scientist, data enthusiasts, startups and many others with an interest in data. We are delighted to have two very distinguished speakers: Andy Palmer and Owen Zhang (bio and talk details below) join us as our evening speakers. We will then wrap up with a final networking stretch. [RSVP]

07:15 PM Kickoff talk – Using Data and Analytics to Make Elephants Dance (Andy Palmer) @ Thomson Reuters

Big Data 2.0 and Analytics 3.0 create an unprecedented opportunity for large companies to become as nimble and innovative as smaller companies by democratizing analytics and decision-making. However, success requires building an information culture, with an emphasis on bottom-up information-seeking and sharing – plus giving people the power to act on the information to make faster and better decisions. Learn why transforming information culture is never easy – but necessary for those companies that want to facilitate the development of a data-driven enterprise. [RSVP]

07:45 PM Kickoff talk – Winning Data Science Competitions (Owen Zhang) @ Thomson Reuters

Owen Zhang is no stranger to data science competitions. He has competed in and won several high profile challenges, and is currently ranked 1st out of a community of 200,000 data scientists on Kaggle. This is an opportunity to learn the tips, tricks and techniques Owen employs in building world-class predictive analytic solutions. [RSVP]

Tuesday November 4

05:30 PM Quantifying Uncertainty: Evaluating Trading Algorithms using Probabilistic Programming (Thomas Wiecki) @ hack/reduce

There exist a large number of metrics to evaluate the performance and risk of a trading strategy. Although those metrics have proven to be useful tools in practice, most of them require a large amount of data and yield unstable results on shorter timescales. Quantopian allows users to develop and launch trading algorithms that invest in the stock market. As we have launched live trading less than a year ago, estimating performance with few data points becomes critical. Bayesian modeling is a flexible statistical framework well suited for this problem as uncertainty can be directly quantified in terms of the posterior distribution.

In this talk I will briefly provide an overview of Bayesian statistics and how Probabilistic Programming frameworks like PyMC can be used to build and estimate complex statistical models. I will then show how several common financial risk metrics like Alpha and Beta can be expressed as a probabilistic program. Finally, I will apply this type of Bayesian data analysis to evaluate the performance of anonymized real-world trading algorithms running on Quantopian. [RSVP]

05:30 PM Big Data for All, All for Big Data: The Cross-Industry impact of Big Data @ DataXu

Big Data has come a long way over the last few years. No longer the domain of IT, big data is infiltrating all industries and departments, with different technologies, use cases and benefits for each.

Join industry leaders in Big Data, including experts from Finance, marketing and DataXu co-founders Sandro Catanzaro and Bill Simmons as they discuss the sea change in big data technologies and how it has changed those industries. [RSVP]

07:30 PM Vector Space Word Representations (Rani Nelken) @ hack/reduce

NLP has traditionally mapped words to discrete elements without underlying structure. Recent research replaces these models with vector-based representations, efficiently learned using neural networks. The resulting embeddings not only improve performance on a variety of tasks, but also show surprising algebraic structure. I will give a gentle introduction to these exciting developments. [RSVP]

Wednesday November 5

05:30 PM Data Science on a Budget: Maximizing Insight and Impact (Nicholas Arcolano) @ hack/reduce

Many companies have “big data”, but not every company has the resources (or need) for a big data team. In this talk we will discuss lessons I’ve learned from working as part of a small team within a fast-moving mobile start-up and techniques for getting the most out of your data on a budget. [RSVP]

05:30 PM In Defense of Imprecision: Why Traditional Approaches to Data Visualization are Changing (Mark Schindler) @ Cambridge Innovation Center

In the worlds of research, science, and academia, much attention is given to precision, objectivity, and de-biasing data. The ability to “lie with data” is a legitimate concern. In business analytics, and the burgeoning area of consumer-facing data visualization, though, complete objectivity is not the singular goal, because these situations are about business and personal decision making- processes that are filled with subjectivity– and also with creativity and intuition.[RSVP]

05:30 PM Building Fast Applications for Streaming Data (Ryan Betts) @ Microsoft NERD Center

Data is moving at blinding speeds, generated by a wide range of sources – from mobile phones and sensors to a variety of connected “smart” devices. This data, most valuable the moment it arrives, will continue to increase in both volume and variety. Leveraging data instantly provides the opportunity to make real-time decisions, reduce risks, and sense patterns, delivering the competitive edge to react quickly and correctly.Stream processing was not designed to serve the needs of modern Fast Data applications. Despite its ability to rapidly ingest data, streaming requires additional code – and a database – to maintain state. This adds application complexity, moving performance bottlenecks to another component in the system. The results are systems that don’t meet the requirements of modern applications.

Employing a solution that handles streaming data, provides state, ensures durability, and supports transactions and real-time decisions is key to benefitting from fast data. During this presentation participants will learn: (a) The difference between real-time analytics and real-time decisions (b) How streaming applications deliver more value when built on an in-memory, NewSQL database; and (c) that making fast data smart is a significant market opportunity that requires a new database platform designed for the volume, variety and scale of high-speed data.[RSVP]

07:30 PM How to Quantify Culture: Introduction to R Workshop (Ethan Fosse) @ hack/reduce

This workshop provides an overview of R for those who are unfamiliar with statistics or programming. The first part of the workshop will review the basics of R as an object-oriented statistical programming language, with an emphasis on creating and manipulating objects. The second part will focus on the fundamentals of data analysis, including how to load and manipulate data sets, summarize and visualize variables (using bar graphs, scatter plots, and histograms), and understanding relationships among variables through the fitting of statistical models (such as linear regression, analysis of variance, and classification techniques). No knowledge of statistics or programming is required. [RSVP]

07:30 PM Visualizing Networks (Lynn Cherny) @ Cambridge Innovation Center

Network data is increasingly pervasive, but can be hard to work with. The naive first pass at a network diagram usually looks like a “hairball.” If you add some simple network measures like degree, betweenness, centrality, and community membership, you’ll be able to create more comprehensibe network representations. Along with the ubiqitous force layout, I’ll cover alternative layout options and interaction techniques to improve end-user experience of your network visuals. Examples shown will be primarily from D3.js and Gephi. [RSVP]

Thursday November 6

05:00 PM Data Science to the Rescue of Healthcare Costs (Ramesh Kumar) @ Cambridge Innovation Center

More and more data is being collected and made available in our healthcare system. Mckinsey Report projects that $300bn can be identified and saved in our US healthcare system through better data analytics. But how do we do it? Where is the money in our system? What kind of data is available? What type of analytics is going to disrupt our healthcare system? What are the opportunities? Come and listen to 4 leading companies and entrepreneurs that are using healthcare data to reduct our healthcare costs. [RSVP]

05:30 PM The Shape of Data: An Intuitive Introduction to the Geometry Behind Machine Learning/Data Learning (Jesse Johnson) @ Thomson Reuters

Most experts in data analysis think about data in terms of (relatively simple) geometry. However, in many introductory sources, the geometry is hidden under layers of technical details. The goal of this talk is to put the geometry front-and-center, giving the audience a perspective that will help them to continue exploring data. [RSVP]

05:30 PM Bringing Coherence to Chaos-Automated Analysis on Large-Scale Social Data for the 2014 World Cup (Catherine Havasi) @ hack/reduce

Twitter has changed. The growth and worldwide success of Twitter has surfaced natural limitations of hashtags and keyword searches. What was once a mechanism for organizing information for efficient consumption is now often rendered obsolete by overwhelming volume and diversity of discussion. Listening to a large number of posts on a given hashtag becomes unproductive as the conversations spawned within and around these hashtags are indistinct and drowned. For SONY, Luminoso restored the ability to understand, consume, and participate in large-scale social media discussions by automatically eliminating spam, removing duplicates, clustering thematically similar conversation, and surfacing meaningful discussion. SONY launched the World’s first dedicated football social network – One Stadium Live – a network that enabled fans and media to experience the 2014 FIFA World Cup like never before, harnessing Luminoso’s technology. The One Stadium Live platform provided users with automatically curated topics of discussion, allowing them to participate in and follow conversations they found interesting, while avoiding the ones they didn’t – across 6 different languages. [RSVP]

06:00 PM Data Privacy and Security within Healthcare (Colin J. Zick) @ Foley Hoag & Eliot


07:30 PM Multi-Armed Bandits and Reinforcement Learning in Computational Advertising (Michael Els) @ hack/reduce

This talk will cover the most common learning strategies to solve the multi-armed bandit problem. It will involve a python simulation environment to illustrate how the system changes under different assumptions and how prior learning can influence can seed the system. This will also been discussed from the perspective of the computational advertising framework at MaxPoint where we employ these types of strategies to algorithmically learn optimal ad serving behavior.[RSVP]

Friday November 7

05:30 PM Thinking in Data Workshop (David Weisman) @ hack/reduce

This hour long beginner-level workshop (no laptop needed) focuses on critical thinking about data. We’ll look at fascinating examples of sampling problems, biases, outliers, confounding, and spurious correlations, and show how these lead to wrong conclusions. We’ll also show how exploratory data analysis through visualization can bring clarity. You’ll takeaway a greater awareness of data itself, and be able to apply these ideas on your data science projects.


07:30 PM Mining Living Organisms: Inferring Biological Models from Wet-Lab Experiments (Daniel Lobo) @ hack/reduce

Many living organisms have an extraordinary capacity to self-generate and self-repair complex patterns and shapes. To elucidate the mechanisms driving these poorly-understood processes, biologists are producing an extraordinary complex dataset of surgical and genetic experiments. In this talk, I will present our approach based on formal ontologies, evolutionary computation, and in silico simulators to automate the discovery of biological models from wet-lab experiments, which will pave the way to revolutionary biomedical applications. [RSVP]