Thoughts On: Common-Sense Knowledge Bases

Motivation
Open Mind and ConceptNet
Divisi and AnalogySpace

# Motivation

Common sense knowledge like this goes into the creation of all usable software:

Most appointments on a calendar are during business hours.
People are unlikely to browse past the first 10 search results.
Buttons should look like buttons.

The problem is that we typically encode it by hand, baking it into the program logic instead of specifying it separately. That makes it hard to reason about and hard to adapt to new situations.

Below are some of the current techniques (that I know about) for encoding common sense information. It can't all be captured yet, but a big chunk of it can.

# Open Mind and ConceptNet

The Open Mind Common Sense Database is a web site that has been collecting common-sense assertions¹ about our world for many years now, in several languages. People can type assertions² such as “Carrots are orange” and the database will store that as a triple:

(‘carrot’, HasProperty, ‘orange’)

The left and right terms are Concepts and the middle term is a Relation. Not many relations are supported, but Open Mind knows about many concepts.

You can walk the graph created by this information using ConceptNet, which has a web API and a Python library (for offline use).

# Divisi and AnalogySpace

The Open Mind data is spotty because there are a lot of things to say about the world. To take advantage of what we do know by drawing analogies, they created a singular value decomposition (SVD) library called Divisi to fill in the gaps.

The idea is to create a two-dimensional matrix of the Open Mind triples by combining the relation and second concept:

(‘carrot’, ‘HasProperty orange’)

Each such assertion becomes a cell in a huge matrix with a row for each concept, and a column for each relation/concept pair. The value of the cell is the confidence in that assertion, based on user votes.

     ‘IsA animal’  ‘HasProperty cute’
cat        1                1
dog        1

To generalize, SVD is applied, some of the least significant axes are thrown out, and the matrix reconstructed. The effect is like squinting at the original matrix. When two rows are similar, values present in one row will bleed into the other.

     ‘IsA animal’  ‘HasProperty cute’
cat       1.2              0.7
dog       0.7              0.4