I like solving applied machine learning problems, especially those that deal with natural language processing or interactive intelligence. I've had fun doing so in academia, large companies and startups since 2004.

One page resume Two page resume
Automating compliance at RedMarker
Current work at RedMarker

I'm helping build tools to change how the financial industry stays compliant with regulation. We're focussing on applied machine learning and natural language processing that scales and lives in containers or serverless environments.

Sydney NLP Meetup
Natural language processing meetup in Sydney

Ben, Adam, Alex and I started this meetup in 2015 on a hunch that there was a community with interesting things to talk about.

Active Learning ... and Beyond! 🚀
Tutorial at ALTA17

Ben, Bo and I prepared a tutorial on using active learning to bootstrap simple NLP systems, including a live shared task.

Website Jupyter Slides
Playing around with fastText to classify reddit submissions
Tutorial at the Sydney NLP meetup

This is a quick and dirty tutorial trying out fastText on some toy reddit text classification. See also bonus round on hacking continuous features into fastText.

Website GitHub
Collective intelligence for web person search
Work at Hugo.ai (a.k.a. Abbrevi8)

Helped build a cool person research app backed by human and machine collective intelligence with Ben, Bo, Sze Kai and Andy.

Generating short biographies from database records using neural networks.
Work at Hugo.ai (a.k.a. Abbrevi8)

I co-supervised Andy building a system to generate biographical sentences from database records (i.e. Wikidata to Wikipedia) using neural sequence-to-sequence models.

EACL17 Slides
Analysing post-editing of biography text
Work at Hugo.ai (a.k.a. Abbrevi8)

I worked with the rest of the Hugo team to analyse how content-experts corrected automatically-generated biography text for our users.

Analysing translation into emoji with EmojiDick
Work at Hugo.ai (a.k.a. Abbrevi8)

I analyzed the translations created in EmojiDick, a project to translate Moby Dick into emoji 🐳.

ALTA16 paper Jupyter Slides
Triaging mental health forum posts
Work at Hugo.ai (a.k.a. Abbrevi8)

Ben, Glen and I submitted a system to the CLPsych shared task to triage posts on the Reach Out mental health forum.

Discovering knowledge bases on the web
Work at Hugo.ai (a.k.a. Abbrevi8)

I co-supervised Andy on work trying to identify website URLs that act as endpoints for an ad-hoc knowledge base.

Evaluating entity timelines
Work at Hugo.ai (a.k.a. Abbrevi8)

I co-supervised Xavier H. on a project evaluating how entity timelines -- lists of stories about an entity -- are built.

High quality named entity recognition with knowledge bases
Work at Xerox Research Centre Europe, now Naver Labs Europe.

Jamie, Xavier C. and I worked on some interactive information extraction, investigating the question: if a user could correct the first few sentences of a document, how well could a system tag the rest?

EMNLP15 Patent
Natural language understanding for chatbots
Work at Xerox Research Centre Europe, now Naver Labs Europe.

I worked on a large team on chatbots for customer care, specialising in natural language understanding in the mobile telecoms domain -- what is the user's intent, what entities and problems are they talking about?

Tracking dialogue state
Work at Xerox Research Centre Europe, now Naver Labs Europe.

Julien and I worked on a submission to the 4th Dialogue State Tracking Challenge that attempted to keep up with a conversation between a travel guide and their customer.

Mining gender from IMDb castlists
Work at Xerox Research Centre Europe, now Naver Labs Europe.

Matthias and I mined IMDb for insights into how film and television gender representation has changed over time. It got picked up by The Gray Lady😱.

WWW15-WS New York Times JWS16 GitHub
Teasing apart similar languages from Tweets using social data
Work at Xerox Research Centre Europe, now Naver Labs Europe.

Matthias and I used social data to identify the language a tweet is written in.

arXiv Patent
Named Entity Linking for media
PhD project with Sydney Uni, the CMCRC and Fairfax Media

My thesis (supervised by James and Ben) was on how to link named entity mentions to knowledge bases -- or which Wikipedia article should we link "John Smith" to in a news story? We worked with Fairfax to commercialise this into zoom, a product that lets users browse news by entity rather than story. We also participated in the TAC shared task from 2010 to 2013.

TAC10 TAC11 TAC12 TAC13 AIJ13-NEL AIJ13-NER ACL14 PhD NewsWWW15 Slides
Tracking the flow of financial news
PhD project with Sydney Uni, the CMCRC.

This project was to try and identify how ASX company announcements were reported in the Reuters news service, categorising whether they reported facts, general background or new analysis.

ALTA09 Honours NAACL10-workshop
Demographic and psychometric profiling email authors using the messages they write
Work at Appen.

I worked on a team collecting data and building a system to predict an email author's age, education, gender and big-five psychometric attributes from the text they had written.

Work at Sydney Uni

I tutored INFO1903 Infomatics (Advanced) (lots of python in 2010, 2011, 2012) and COMP5338 Advanced Data Models (2010, 2011). I received a 2012 Dean's Award for Excellence in Tutoring (School). I have delivered a couple of guest lectures for software engineering and NLP courses.

Guest lecture slides