module ScopedSearch::QueryLanguage::Tokenizer

The Tokenizer module adds methods to the query language compiler that transforms a query string into a stream of tokens, which are more appropriate for parsing a query string.

Constants

KEYWORDS

All keywords that the language supports

OPERATORS

Every operator the language supports.

Public Instance Methods

current_char() click to toggle source

Returns the current character of the string

   # File lib/scoped_search/query_language/tokenizer.rb
19 def current_char
20   @current_char
21 end
each(&block)
Alias for: each_token
each_token() { |:lparen| ... } click to toggle source

Tokenizes the string by iterating over the characters.

   # File lib/scoped_search/query_language/tokenizer.rb
37 def each_token(&block)
38   while next_char
39     case current_char
40     when /^\s?$/; # ignore
41     when '(';  yield(:lparen)
42     when ')';  yield(:rparen)
43     when ',';  yield(:comma)
44     when /\&|\||=|<|>|\^|!|~|-/;  tokenize_operator(&block)
45     when '"';                  tokenize_quoted_keyword(&block)
46     else;                      tokenize_keyword(&block)
47     end
48   end
49 end
Also aliased as: each
next_char() click to toggle source

Returns the next character of the string, and moves the position pointer one step forward

   # File lib/scoped_search/query_language/tokenizer.rb
31 def next_char
32   @current_char_pos += 1
33   @current_char = @str[@current_char_pos, 1]
34 end
peek_char(amount = 1) click to toggle source

Returns a following character of the string (by default, the next character), without updating the position pointer.

   # File lib/scoped_search/query_language/tokenizer.rb
25 def peek_char(amount = 1)
26   @str[@current_char_pos + amount, 1]
27 end
tokenize() click to toggle source

Tokenizes the string and returns the result as an array of tokens.

   # File lib/scoped_search/query_language/tokenizer.rb
13 def tokenize
14   @current_char_pos = -1
15   to_a
16 end
tokenize_keyword() { |KEYWORDS| ... } click to toggle source

Tokenizes a keyword, and converts it to a Symbol if it is recognized as a reserved language keyword (the KEYWORDS array).

   # File lib/scoped_search/query_language/tokenizer.rb
63 def tokenize_keyword(&block)
64   keyword = current_char
65   keyword << next_char while /[^=~<>\s\&\|\)\(,]/ =~ peek_char
66   KEYWORDS.has_key?(keyword.downcase) ? yield(KEYWORDS[keyword.downcase]) : yield(keyword)
67 end
tokenize_operator() { |OPERATORS| ... } click to toggle source

Tokenizes an operator that occurs in the OPERATORS hash The .to_s on [peek|next]_char is to prevent a ruby bug when nil values are returned from strings which have forced encoding. github.com/wvanbergen/scoped_search/issues/33 for details

   # File lib/scoped_search/query_language/tokenizer.rb
55 def tokenize_operator(&block)
56   operator = current_char
57   operator << next_char.to_s if OPERATORS.has_key?(operator + peek_char.to_s)
58   yield(OPERATORS[operator])
59 end
tokenize_quoted_keyword() { |keyword| ... } click to toggle source

Tokenizes a keyword that is quoted using double quotes. Allows escaping of double quote characters by backslashes.

   # File lib/scoped_search/query_language/tokenizer.rb
71 def tokenize_quoted_keyword(&block)
72   keyword = ""
73   until next_char.nil? || current_char == '"'
74     keyword << (current_char == "\\" ? next_char : current_char)
75   end
76   yield(keyword)
77 end