This workshop is intended to encourage interdisciplinary research between NLP and Software Engineering resources. We invited a range of researchers with both NLP and SE backgrounds to come together, discuss their research, establish datasets, tasks, and baselines, and generally help the field build momentum.

Recent updates

Please add suggsted resources and datasets to this spreadsheet. We'll review and prioritize in the closing session.

Tuesday sessions were rearranged; please check the schedule.

Call for participation

Get the current call for participation as a PDF.


Please submit your 1–2 page paper here:

Important dates

Sept 1, 2015
1-2 page paper due
Oct 1, 2015
Program finalized and sent out to attendees
Oct 25–27, 2015
Workshop dates
Dec 15, 2015
Final report produced

Attendee list

Name Institution
Abram HindleUniversity of Alberta
Alvin CheungUniversity of Washington
Andrian MarcusUniversity of Texas at Dallas
Ashish VaswaniUniversity of Southern California, Information Systems Institute
Baishakhi RayUniversity of Virginia
Ben SnyderUniversity of Wisconsin
Chang LiuOhio University
Charles SuttonUniversity of Edinburgh
Chris QuirkMicrosoft Research, Redmond, Washington
Collin McMillanUniversity of Notre Dame
Dana Movshovitz-AttiasCarnegie Mellon University
Daniel TarlowMicrosoft Research, Cambridge England
David ChiangUniversity of Notre Dame
Dawn LawrieLoyola University Maryland
Denys PoshyvanykCollege of William & Mary
Earl BarrUniversity College London
Gagan BansalUniversity of Washington
Giriprasad SridharaIBM
Graham NeubigNara Institute of Science and Technology
Jane Cleland-HuangDePaul University
Jennifer D'SouzaUniversity of California, Davis
Jim DonlonNational Science Foundation
Luke ZettlemoyerUniversity of Washington
Mark MarronMicrosoft Research, Research, Washington
Martin MonperrusUniversity of Lille
Martin WhiteCollege of William & Mary
Mirella LapataUniversity of Edinburgh
Nate KushmanMassachusetts Institute of Technology
Patrick WagstromIBM Watson
Percy LiangStanford University
Prem DevanbuUniversity of California, Davis
Ray MooneyUniversity of Texas at Austin
Razvan BunescuOhio University
Sol GreenspanNational Science Foundation
Sonia HaiducFlorida State University
Srini IyerUniversity of Washington
Tao XieUniversity of Illinois
Tatiana KorelskyNational Science Foundation
Tien N. NguyenIowa State University
Venera ArnaoudovaWashington State University
Vincent HellendoornUniversity of California, Davis
Vladimir FilkovUniversity of California, Davis
William CohenCarnegie Mellon University
Yi WeiMicrosoft Research, Cambridge England
Yoav ArtziCornell University
Zhendong SuUniversity of California, Davis
Zhilin YangCarnegie Mellon University

Workshop program

This three day workshop will be held Sunday, October 25 until Tuesday, October 27, 2015. All sessions will be the large lecture hall in Microsoft Building 99, room 99/1919.

Program overview

Sunday, October 25

1:00pm – 2:30pm

Tutorial session: n-gram and Neural Network Language Modeling

Ashish Vaswani

PDF slides

2:30pm – 3:00pm
coffee break
3:00pm – 4:30pm

Tutorial session: Software Mining and Software Datasets

Tao Xie

PDF slides

4:30pm – 5:30pm
break for check in, etc.
6:00pm – 8:30pm
Dinner, catered at Microsoft Building 99

Monday, October 26

8:00am – 9:00am
Light breakfast in 99/1919
9:00am – 10:15am

Software tools and processes

(Organizers: Premkumar Devanbu and Chris Quirk)

(Scribe: Jennifer D’Souza)

09h00 – 09h30 Opening session and introduction

09h30 – 10h15 Keynote by Charles Sutton

10:15am – 10:30am
Coffee break
10:30am – 11:15am

Software tools and processes (cont'd)

10h30 – 11h15 Open Discussion

  1. What kind of collaborations would help move this area forward?
  2. Are there other questions/problems that remain unexplored?
  3. What data resources are needed?
  4. Are there benchmarks or evaluation contests that are needed?
11:15am – 12:00pm

Data repositories

(Organizers: Premkumar Devanbu and Chris Quirk)

(Scribe: Zhilin Yang)

11h15 – 11h45 Talk by Tien Nguyen: The BOA Code Repository and Infrastructure

11h45 – 12h00 Questions/Discussion

12:00pm – 1:00pm
Lunch break
1:00pm – 2:15pm

Mutual introductions: Two minute madness

2:15pm – 2:30pm
Coffee break
2:30pm – 4:00pm

Code and Program Modeling

(Organizers: Charles Sutton and Tien Nguyen)

(Scribe: Vincent Hellendoorn)

2h30 – 3h00 Talk by Earl Barr, University College London: Inference Problems in Software Engineering

3h00 – 3h30 Talk by Daniel Tarlow, Microsoft Research

3h30 – 4h00 Open Discussion

  1. Applications of code and program modeling?
  2. When are different levels of information appropriate, e.g., lexical, syntactic, semantic?
  3. Calling all hammers: What are NLP modeling techniques that are ripe for carrying over?
  4. What’s special about software? Can we just keep importing standard NLP methods willy-nilly or do we need methods that are SWE-specific?
4:00pm – 4:15pm
Coffee break
4:15pm – 5:15pm

Ontologies and Understanding of Software Semantics

(Organizers: Dana Movshovitz-Attias and Tao Xie)

(Scribe: Zhilin Yang)

4:15pm – 4:35pm Talk by Jane Cleland-Huang: Leveraging Software Project Knowledge to Build Ontology

4:35pm – 5:15pm Open Discussion

  1. What are software engineering tasks that can benefit from a software ontology or semantic understanding?
  2. What are the unique characteristics of software entities that make them more/less susceptible for semantic analysis?
  3. Emerging NLP techniques that can be leveraged for software semantic understanding. Which ones have been successfully used by the participants?
  4. Open problems or challenges in this area.
  5. Available software ontologies or related resources.
  6. Evolving software ontologies. What changes can cause a software ontology to evolve over time? What are methods for accommodating such changes?
  7. What kind of collaborations would help move this area forward?
5:15pm – 6:00pm
Free time
6:00pm – 9:00pm
Dinner at Luc in the Madison Park neighborhood of Seattle; transportation from Microsoft and to Microsoft+hotel provided

Tuesday, October 27

8:00am – 9:00am
Light breakfast in 99/1919
9:00am – 10:30am

Information Retrieval in Software Engineering

(Organizers: Denys Poshyvanyk and Dana Movshovitz-Attias)

(Scribe: Martin White)

09h00 – 09h20 Talk by Andrian Marcus: Overview of Text Retrieval Applications in Software Engineering

09h20 – 10h30 Open Discussion (sample questions are below)

Examples of successful ideas (applications of IR in SE): participants talk for 1 minute to give examples of their prior research projects

  1. Open problems or grand challenges?
  2. Emerging Information Retrieval techniques or approaches?
  3. What kind of collaborations would help move this area forward?
  4. What datasets and resources are available?
  5. Challenges in reproducibility of the experiments
10:30am – 10:45am
Coffee break
10:45am – 12:15pm

Natural Language Programming and Semantic Parsing

(Organizers: Ray Mooney and Chris Quirk)

(Scribe: Gagan Bansal)

A set of 5 minute talks by:

  1. Ray Mooney
  2. Chris Quirk
  3. Gagan Bansal
  4. Yoav Artzi
  5. Percy Liang
  6. Srini Iyer
  7. Nate Kushman
  8. Yi Wei

Open forum discussion, seeded by particular topics:

  • Data
  • Compute cycles
  • Techniques: Is it parsing? Is it MT? Is it program synthesis?
  • Tools
  • Evaluation metrics
  • What are the relevant connections in SE?
  • Dialog -- interaction strategies
  • End-user debugging
  • Where do we focus first? End users? Programmers? Power users
  • Formalism / representation, especially for end users
  • Adding I/O tuples or programming by demonstration
12:15pm – 1:15pm
Lunch break
1:15pm – 2:45pm

Language Generation from Code

(Organizers: Dawn Lawrie and Graham Neubig)

(Scribe: Vincent Hellendoorn)

Mood setting talk: Survey of Methods to Generate Natural Language from Source Code, Graham Neubig (Slides)

Each of the other researches in the topic will be asked to produce at controversial statement or question to fuel discussion. Discussion will also touch on data sets and broader impacts. Other researchers will be asked about two weeks before the workshop to think about their controversial statement.

2:45pm – 3:00pm
Coffee break
3:00pm – 4:30pm

Closing session


The workshop will be held at Microsoft Research in Redmond, WA.

The address is:

Microsoft, Bldg 99
14820 NE 36th Street
Redmond, WA 98052-6399

Directions to visit the venue are available here.


We have reserved a block of rooms at the Courtyard in Bellevue Redmond.

To book, please call (800) 321-2211 and ask for the Microsoft NL + SE Room Block Oct2015 at the Courtyard in Bellevue Redmond. Or you can use this online reservation link

Please book by October 5th to ensure that you receive the block rate.

The address is:

Courtyard Bellevue/Redmond
14615 NE 29th Place
Bellevue, WA 98007


Support for travel and accommodation are offered through a grant from the National Science Foundation. Covered costs include hotel, accommodation and local transit. Our budget can cover travel expenses up to:

  • USD 1900 for international attendees
  • USD 1000 for US West Coast attendees
  • USD 1300 for attendees from other parts of the US

Attendees from the Puget Sound area should contact organizers if they need travel support.


  1. Include paid receipts for airfare, hotel, airport shuttle, and meals.
  2. Note: no alcohol can be reimbursed.
  3. Use UC Davis Form here, sign as "Non Employee"
  4. Maximum budgeted amount for academic invitees: $1000 for West coast, $1300 for domestic attendees, and $1900 for international.
  5. Send to Jane Ryan, Dept of Computer Science, UC Davis, Davis, CA 95616; mark (Attn: NSF Workshop)
  6. Be sure to include return postal address, full name, and telephone number.


The workshop is being sponsored by the US National Science Foundation through Sol Greenspan and Tatiana Korelsky and Microsoft Research.

Steering committee

  • Charles Sutton (Edinburgh)
  • Daniel Tarlow (Microsoft Research, Cambridge)
  • Dawn Lawrie (Loyola)
  • Dennis Poshyvanyk (Willam & Mary)
  • Ray Mooney (University of Texas Austin)
  • Tao Xie (University of Illinois)
  • William Cohen (Carnegie Mellon University)


Prem Devanbu (UC Davis) and Chris Quirk (Microsoft Research) along with Dana Movshovitz-Attias (CMU) organized the workshop, under the guidance and direction of Sol Greenspan and Tatiana Korelsky from the NSF.

Organizer information here


For questions or comments about the website, please contact Chris Quirk.