Hacker News new | past | comments | ask | show | jobs | submit login

You can do better than O(cfkn). You can precompute the results of some features seen together (i.e. S0w/S0t/S0wt), and store a ref to them on the words. This saves feature lookups per feature. If you store them as vectors for the classes, you can get a nice perf. boost from SIMD.. you can knock off at least an order of magnitude if not two. something like O(c'f'kn), where c' is the # of SIMD ops to aggregate pre computed weight vectors and f' is the number of feature template groups. Also Yoav Goldberg mentioned feature signatures in one of his papers which can do a bit better.

A significant problem with unlabeled dep. parsing is that you can't differentiate important things like subject vs object dependents. In the sentence "They ate the pizza with anchovies.", how would a program distinguish between 'they' as the subject and 'pizza' as the object? In other words, who ate what?




Yes, you're probably aware of this paper, right?

Goldberg et al, "Efficient Implementation of Beam-Search Incremental Parsers". ACL 2013. http://www.aclweb.org/anthology/P13-2111

I haven't been able to work out how to do the feature caching in a way that won't ruin my implementation when I need to add more features.

I also get substantial benefit at high k from hashing the "kernel tokens" and memoising the score for the state.

I did try the tree-structured stack that they recommend, but I didn't find any run-time benefits from it, and the implementation kept confusing me. I might have made a mistake, but I suspect it's because my state arrays are copied with low-level malloc/free/memcpy, where they pay Python overhead on their copies.


Yes.

I didn't see noticeable improvements from TSS either. I did some performance tuning - much more time goes to feature extraction and scoring. Can you elaborate on what you mean by 'hashing the "kernel tokens" and memoising the score for the state'? Are the kernel tokens something like the head of stack/queue?

For feature caching, I went with a generic model for a feature template as a combination of feature elements (for features like S0t+Q0t+Q1t) that have a closed set, so the feature template is limited to a set that is a cartesian product of the elements' sets. When you initialise parsing for a new sentence, you can select a subset of the possibilities to generate a "submodel" for only that sentence. That way you need much less memory. If you can pack it properly you can get a lot of it into the lower level caches which should allow for significant speed up.


Thanks for the explanation of how your cache works. Will you be at ACL this year?

The memoisation I refer to is called here:

https://github.com/syllog1sm/redshift/blob/segmentation/reds...

What happens is, I extract the set of token indices for S0, N0, S0h, S0h2, etc, into a struct SlotTokens. SlotTokens is sufficient to extract the features, so I can use its hash to memoise an array of class scores. Cache utilisation is about 30-40% even at k=8.

While I'm here...

https://github.com/syllog1sm/redshift/blob/segmentation/reds...

The big enum names all of the atomic feature values that I extract, and places their values into an array, context. So context[S0w] contains the word of the token on top of the stack.

I then list the actual features as tuples, referring to those values. So I can write a group of features with something like new_features = ((S0w, S0p), (S0w,), (S0p,)). That would add three feature templates: one with the word plus the POS tag, one with just the word, one with just the POS tag.

A bit of machinery in features.pyx then takes those Python feature definitions, and compiles them into a form that can be used more efficiently.


I won't be at ACL this year, maybe next. My advisor will be there though (Reut Tsarfaty).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: