send feedback

Postvorta supports an advanced query syntax to enable you to search for 'things' as well as words. For, example the following query will search for documents in which a single sentence contains both the name of an organization and a person being quoted.

{Sentence} OVER ({Organization} AND ({Person} [0..3] root:say))

This single example contains most of the possible query syntax and so will serve as a worked example throughout the rest of this documentation

Searching For Words

Any single word is treated as a plain text query. So to start building up the worked example we could assume that when a person is quoted in a document you will find text along the lines of "he was quoted as saying...". In this case we could use the query

saying

Unfortunately this would not match text such as "he said..." which is also someone being quoted. Fortunately Postvorta analyses each word in order to provide alternative views of the text. Currently this includes support for searching via the morphological root form of a word or it's part-of-speech category. So we can update our query to search for all words whose root is say, i.e. say, says, said, saying.

root:say

If we had wanted to be even more general and simply match any past tense verbs then we could have used the following query instead (the list of supported part-of-speech labels is the same as that listed in the GATE Manual).

category:VBD

Most search engines treat a query as a bag-of-words. That is each word is matched separately against the index and the matching documents are ranked (amongst other things) based upon the number of matching words they contain. Postvorta is different in that, be default, it treats the entire query as a sequence. This means that to match text such as "he said..." we can use the query:

he root:say

As Postvorta treats this as a sequence it won't match text such as "she said he was..." which is good, but it also won't match "he was quoted as saying..." which isn't so good. Fortunately we can still match this using a gap query. Gap queries state the number of words that can appear between two other things in the query. To continue building our worked example, we can now expand the previous query to allow up to three words to appear between the word he and the verb to say.

he [0..3] root:say

Searching For 'Things'

One of the main reasons for using the advanced query syntax is that it enables you to search for 'things', usually referred to as annotations, as well as words. This is possible because each blog post is processed and semantically annotated during the indexing process. Currently Postvorta supports searching for people, organizations and locations. To search for an annotation you specify the type within a pair of braces. So to search for people you would use the following query:

{Person}

If you were interested in organizations or locations then their types are (unsuprisingly) Organization and Location respectively. To continue building up our worked example, we can now extend the query to look for any person being quoted rather than just the word he.

{Person} [0..3] root:say

As well as annotations over 'things', Postvorta also creates a number of annotations over different sections of each blog post. Most of these are only used internally, however, you can also search for sentences using the type Sentence.

Complex Queries

Now that you have seen how to search for words and annotations and how to combine them into a single sequence it's time to look at the remaining syntax that allows for these queries to be combined in other ways to form more complex queries. Firstly we have the boolean operators AND and OR. So for our worked example we want sentences that contain an organization and a person being quoted, which we can write as

{Organization} AND ({Person} [0..3] root:say)

Note that we surrounded the section of the query, about a person being quoted, in round brackets to remove any ambiguity as to how the two parts could be combined. If we had wanted sentences mentioning either an organization or a person being quoted then we could simply have replaced the AND by OR. Note that case is important here; and would be treated simply as a search for the word and.

The final operators allow us to overlap queries. The worked example currently looks for documents that contain an organization and a person being quoted but we initially stated that we wanted both to appear within a single sentence. The overlap operators allow us to add this final requirement. There are two operators that we could use: OVER and IN. In this particular situation it doesn't actually matter which we use but it is worth explaining the difference between them. When building complex queries from multiple small queries, at each stage only part of the query has to be retained for later matching. With the case of the overlap operators only the query before the operator is required once the whole section is matched. Let's take a simple example to make this clearer. let us assume we are looking for people with the first name John. We now how to look for a single word and we know how to look for a person annotation so we can combine these in two ways.

{Person} OVER John

John IN {Person}

In the first case the query actually matches a person annotation which overlaps the word John. In the second example the query matches just the word John and not the whole of the person's name. Whilst this may not matter in such simple queries if we expand this to people called John being quoted then we get the queries.

({Person} OVER John) [0..3] root:say

(John IN {Person}) [0..3] root:say

Whilst both queries will match the text "John was quoted as saying..." only the first will match "John Smith was quoted as saying..." because without matching the whole of the person annotation there are actually four words between John and the verb to