Bayesian Synthesis of Probabilistic Programs for Automatic Data Modeling (POPL 2019 - Research Papers)

Who

Feras Saad, Marco Cusumano-Towner, Ulrich Schaechtle, Martin C. Rinard, Vikash K. Mansinghka

Track

POPL 2019 Research Papers

Time Zone

The program is currently displayed in (GMT) Belfast.

Use conference time zone: (GMT) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 16 Jan 2019 14:29 - 14:51 at Sala I - Probabilistic Programming and Semantics Chair(s): Justin Hsu

Abstract

We present new techniques for automatically constructing probabilistic programs for data analysis, interpretation, and prediction. These techniques work with probabilistic domain-specific data modeling languages that capture key properties of a broad class of data generating processes, using Bayesian inference to synthesize probabilistic programs in these modeling languages given observed data. We provide a precise formulation of Bayesian synthesis for automatic data modeling that identifies sufficient conditions for the resulting synthesis procedure to be sound. We also derive a general class of synthesis algorithms for domain-specific languages specified by probabilistic context-free grammars and establish the soundness of our approach for these languages. We apply the techniques to automatically synthesize probabilistic programs for time series data and multivariate tabular data. We show how to analyze the structure of the synthesized programs to compute, for key qualitative properties of interest, the probability that the underlying data generating process exhibits each of these properties. Second, we translate probabilistic programs in the domain-specific language into probabilistic programs in Venture, a general-purpose probabilistic programming system. The translated Venture programs are then executed to obtain predictions of new time series data and new multivariate data records. Experimental results show that our techniques can accurately infer qualitative structure in multiple real-world data sets and outperform standard data analysis methods in forecasting and predicting new data.

Link to Publication

https://dl.acm.org/ft_gateway.cfm?id=3290350

DOI

https://doi.org/10.1145/3290350

File attachments

Slide Deck (popl19main-p184-slides.pdf)	3.25MiB

Feras Saad

Massachusetts Institute of Technology

United States

Marco Cusumano-Towner

MIT-CSAIL

Ulrich Schaechtle

Massachusetts Institute of Technology, USA

Martin C. Rinard

Massachusetts Institute of Technology

United States

Vikash K. Mansinghka

MIT

YouTube Video