About

I like solving applied machine learning problems, especially those that deal with design, creativity, natural language processing or interactive intelligence. I've had fun doing so in academia, large companies and startups since 2004.

One page resume Longer resume
Applying machine learning to design at Canva
Current work at Canva

I'm using machine learning to help users create designs more quickly and easily. Here's my YOW! Data 2021 talk on how we used machine learning to support creative exploration of our template library by bringing users' text with them as they try out new templates.

Co-organising the 1st and 2nd workshops on Gender Bias for NLP
Supported by Canva

I'm proud to be co-organising an academic workshop and shared task that help focus on making NLP tools fairer.

Website Kaggle Bias statement blog post
Automating compliance at RedMarker
Work at RedMarker

I helped build tools to change how the financial industry stays compliant with regulation. We focussed on applied machine learning and natural language processing that scales and lives in containers or serverless environments.

NLP-in-the-wild-slides
Sydney NLP Meetup
Natural language processing meetup in Sydney

Ben, Adam, Alex and I started this meetup in 2015 on a hunch that there was a community with interesting things to talk about. During my time there, we grew to over 800 people and organised over 20 events, including three mini-conferences for academia and industry.

Website
Forecasting psychological distress from childhood essays
Submission to the CLPsych workshop 2018 shared task

Kylie, Louise, Ruth, Kim, Scott, Ben, and I submitted a system that tries to predict adult psychological distress from childhood essays. It's really hard!

CLPsych18
Active Learning ... and Beyond! 🚀
Tutorial at ALTA17

Ben, Bo and I prepared a tutorial on using active learning to bootstrap simple NLP systems, including a live shared task.

Website Jupyter Slides
Playing around with fastText to classify reddit submissions
Tutorial at the Sydney NLP meetup

This is a quick and dirty tutorial trying out fastText on some toy reddit text classification. See also bonus round on hacking continuous features into fastText.

Website GitHub
Collective intelligence for web person search
Work at Hugo.ai (a.k.a. Abbrevi8)

Helped build a cool person research app backed by human and machine collective intelligence with Ben, Bo, Sze Kai and Andy.

Generating short biographies from database records using neural networks.
Work at Hugo.ai (a.k.a. Abbrevi8)

I co-supervised Andy building a system to generate biographical sentences from database records (i.e. Wikidata to Wikipedia) using neural sequence-to-sequence models.

EACL17 Slides
Analysing post-editing of biography text
Work at Hugo.ai (a.k.a. Abbrevi8)

I worked with the rest of the Hugo team to analyse how content-experts corrected automatically-generated biography text for our users.

WWW17
Analysing translation into emoji with EmojiDick
Work at Hugo.ai (a.k.a. Abbrevi8)

I analyzed the translations created in EmojiDick, a project to translate Moby Dick into emoji 🐳.

ALTA16 paper Jupyter Slides
Triaging mental health forum posts
Work at Hugo.ai (a.k.a. Abbrevi8)

Ben, Glen and I submitted a system to the CLPsych shared task to triage posts on the Reach Out mental health forum.

CLPsych16
Discovering knowledge bases on the web
Work at Hugo.ai (a.k.a. Abbrevi8)

I co-supervised Andy on work trying to identify website URLs that act as endpoints for an ad-hoc knowledge base.

AKBC16
Evaluating entity timelines
Work at Hugo.ai (a.k.a. Abbrevi8)

I co-supervised Xavier H. on a project evaluating how entity timelines -- lists of stories about an entity -- are built.

ALTA16
High quality named entity recognition with knowledge bases
Work at Xerox Research Centre Europe, now Naver Labs Europe.

Jamie, Xavier C. and I worked on some interactive information extraction, investigating the question: if a user could correct the first few sentences of a document, how well could a system tag the rest?

EMNLP15 Patent
Natural language understanding for chatbots
Work at Xerox Research Centre Europe, now Naver Labs Europe.

I worked on a large team on chatbots for customer care, specialising in natural language understanding in the mobile telecoms domain -- what is the user's intent, what entities and problems are they talking about?

Video
Tracking dialogue state
Work at Xerox Research Centre Europe, now Naver Labs Europe.

Julien and I worked on a submission to the 4th Dialogue State Tracking Challenge that attempted to keep up with a conversation between a travel guide and their customer.

DSTC4
Mining gender from IMDb castlists
Work at Xerox Research Centre Europe, now Naver Labs Europe.

Matthias and I mined IMDb for insights into how film and television gender representation has changed over time. It got picked up by The New York Times.

WWW15-WS New York Times JWS16 GitHub
Teasing apart similar languages from Tweets using social data
Work at Xerox Research Centre Europe, now Naver Labs Europe.

Matthias and I used social data to identify the language a tweet is written in.

arXiv Patent
Named Entity Linking for media
PhD project with Sydney Uni, the CMCRC and Fairfax Media

My thesis (supervised by James and Ben) was on how to link named entity mentions to knowledge bases -- or which Wikipedia article should we link "John Smith" to in a news story? We worked with Fairfax to commercialise this into zoom, a product that lets users browse news by entity rather than story. We also participated in the TAC shared task from 2010 to 2013.

TAC10 TAC11 TAC12 TAC13 AIJ13-NEL AIJ13-NER ACL14 PhD NewsWWW15 Slides
Tracking the flow of financial news
PhD project with Sydney Uni, the CMCRC.

This project was to try and identify how ASX company announcements were reported in the Reuters news service, categorising whether they reported facts, general background or new analysis.

ALTA09 Honours NAACL10-workshop
Demographic and psychometric profiling email authors using the messages they write
Work at Appen.

I worked on a team collecting data and building a system to predict an email author's age, education, gender and big-five psychometric attributes from the text they had written.

ALTA07
Teaching
Work at Sydney Uni

I tutored INFO1903 Infomatics (Advanced) (lots of python in 2010, 2011, 2012) and COMP5338 Advanced Data Models (2010, 2011). I received a 2012 Dean's Award for Excellence in Tutoring (School). I have delivered a couple of guest lectures for software engineering and NLP courses.

Guest lecture slides