Class ScannerImpl

  • All Implemented Interfaces:
    Scanner

    public final class ScannerImpl
    extends java.lang.Object
    implements Scanner
     Scanner produces tokens of the following types:
     STREAM-START
     STREAM-END
     DIRECTIVE(name, value)
     DOCUMENT-START
     DOCUMENT-END
     BLOCK-SEQUENCE-START
     BLOCK-MAPPING-START
     BLOCK-END
     FLOW-SEQUENCE-START
     FLOW-MAPPING-START
     FLOW-SEQUENCE-END
     FLOW-MAPPING-END
     BLOCK-ENTRY
     FLOW-ENTRY
     KEY
     VALUE
     ALIAS(value)
     ANCHOR(value)
     TAG(value)
     SCALAR(value, plain, style)
     Read comments in the Scanner code for more details.
     
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      private static class  ScannerImpl.Chomping
      Chomping the tail may have 3 values - yes, no, not defined.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private boolean allowSimpleKey
      A simple key is a key that is not denoted by the '?' indicator.
      private boolean done  
      static java.util.Map<java.lang.Character,​java.lang.Integer> ESCAPE_CODES
      A mapping from a character to a number of bytes to read-ahead for that escape sequence.
      static java.util.Map<java.lang.Character,​java.lang.String> ESCAPE_REPLACEMENTS
      A mapping from an escaped character in the input stream to the character that they should be replaced with.
      private int flowLevel  
      private int indent  
      private ArrayStack<java.lang.Integer> indents  
      private static java.util.regex.Pattern NOT_HEXA
      A regular expression matching characters which are not in the hexadecimal set (0-9, A-F, a-f).
      private java.util.Map<java.lang.Integer,​SimpleKey> possibleSimpleKeys  
      private StreamReader reader  
      private java.util.List<Token> tokens  
      private int tokensTaken  
    • Field Detail

      • NOT_HEXA

        private static final java.util.regex.Pattern NOT_HEXA
        A regular expression matching characters which are not in the hexadecimal set (0-9, A-F, a-f).
      • ESCAPE_REPLACEMENTS

        public static final java.util.Map<java.lang.Character,​java.lang.String> ESCAPE_REPLACEMENTS
        A mapping from an escaped character in the input stream to the character that they should be replaced with. YAML defines several common and a few uncommon escape sequences.
        See Also:
        4.1.6. Escape Sequences
      • ESCAPE_CODES

        public static final java.util.Map<java.lang.Character,​java.lang.Integer> ESCAPE_CODES
        A mapping from a character to a number of bytes to read-ahead for that escape sequence. These escape sequences are used to handle unicode escaping in the following formats, where H is a hexadecimal character:
         \xHH         : escaped 8-bit Unicode character
         \uHHHH       : escaped 16-bit Unicode character
         \UHHHHHHHH   : escaped 32-bit Unicode character
         
        See Also:
        5.6. Escape Sequences
      • done

        private boolean done
      • flowLevel

        private int flowLevel
      • tokens

        private java.util.List<Token> tokens
      • tokensTaken

        private int tokensTaken
      • indent

        private int indent
      • indents

        private ArrayStack<java.lang.Integer> indents
      • allowSimpleKey

        private boolean allowSimpleKey
         A simple key is a key that is not denoted by the '?' indicator.
         Example of simple keys:
           ---
           block simple key: value
           ? not a simple key:
           : { flow simple key: value }
         We emit the KEY token before all keys, so when we find a potential
         simple key, we try to locate the corresponding ':' indicator.
         Simple keys should be limited to a single line and 1024 characters.
         
         Can a simple key start at the current position? A simple key may
         start:
         - at the beginning of the line, not counting indentation spaces
               (in block context),
         - after '{', '[', ',' (in the flow context),
         - after '?', ':', '-' (in the block context).
         In the block context, this flag also signifies if a block collection
         may start at the current position.
         
      • possibleSimpleKeys

        private java.util.Map<java.lang.Integer,​SimpleKey> possibleSimpleKeys
    • Constructor Detail

    • Method Detail

      • checkToken

        public boolean checkToken​(Token.ID... choices)
        Check whether the next token is one of the given types.
        Specified by:
        checkToken in interface Scanner
        Parameters:
        choices - token IDs.
        Returns:
        true if the next token can be assigned to a variable of at least one of the given types. Returns false if no more tokens are available.
      • peekToken

        public Token peekToken()
        Return the next token, but do not delete it from the queue.
        Specified by:
        peekToken in interface Scanner
        Returns:
        The token that will be returned on the next call to Scanner.getToken()
      • getToken

        public Token getToken()
        Return the next token, removing it from the queue.
        Specified by:
        getToken in interface Scanner
        Returns:
        the coming token
      • needMoreTokens

        private boolean needMoreTokens()
        Returns true if more tokens should be scanned.
      • fetchMoreTokens

        private void fetchMoreTokens()
        Fetch one or more tokens from the StreamReader.
      • nextPossibleSimpleKey

        private int nextPossibleSimpleKey()
        Return the number of the nearest possible simple key. Actually we don't need to loop through the whole dictionary.
      • stalePossibleSimpleKeys

        private void stalePossibleSimpleKeys()
         Remove entries that are no longer possible simple keys. According to
         the YAML specification, simple keys
         - should be limited to a single line,
         - should be no longer than 1024 characters.
         Disabling this procedure will allow simple keys of any length and
         height (may cause problems if indentation is broken though).
         
      • savePossibleSimpleKey

        private void savePossibleSimpleKey()
        The next token may start a simple key. We check if it's possible and save its position. This function is called for ALIAS, ANCHOR, TAG, SCALAR(flow), '[', and '{'.
      • removePossibleSimpleKey

        private void removePossibleSimpleKey()
        Remove the saved possible key position at the current flow level.
      • unwindIndent

        private void unwindIndent​(int col)
        * Handle implicitly ending multiple levels of block nodes by decreased indentation. This function becomes important on lines 4 and 7 of this example:
         1) book one:
         2)   part one:
         3)     chapter one
         4)   part two:
         5)     chapter one
         6)     chapter two
         7) book two:
         
        In flow context, tokens should respect indentation. Actually the condition should be `self.indent >= column` according to the spec. But this condition will prohibit intuitively correct constructions such as key : { }
      • addIndent

        private boolean addIndent​(int column)
        Check if we need to increase indentation.
      • fetchStreamStart

        private void fetchStreamStart()
        We always add STREAM-START as the first token and STREAM-END as the last token.
      • fetchStreamEnd

        private void fetchStreamEnd()
      • fetchDirective

        private void fetchDirective()
        Fetch a YAML directive. Directives are presentation details that are interpreted as instructions to the processor. YAML defines two kinds of directives, YAML and TAG; all other types are reserved for future use.
        See Also:
        3.2.3.4. Directives
      • fetchDocumentStart

        private void fetchDocumentStart()
        Fetch a document-start token ("---").
      • fetchDocumentEnd

        private void fetchDocumentEnd()
        Fetch a document-end token ("...").
      • fetchDocumentIndicator

        private void fetchDocumentIndicator​(boolean isDocumentStart)
        Fetch a document indicator, either "---" for "document-start", or else "..." for "document-end. The type is chosen by the given boolean.
      • fetchFlowSequenceStart

        private void fetchFlowSequenceStart()
      • fetchFlowMappingStart

        private void fetchFlowMappingStart()
      • fetchFlowCollectionStart

        private void fetchFlowCollectionStart​(boolean isMappingStart)
        Fetch a flow-style collection start, which is either a sequence or a mapping. The type is determined by the given boolean. A flow-style collection is in a format similar to JSON. Sequences are started by '[' and ended by ']'; mappings are started by '{' and ended by '}'.
        Parameters:
        isMappingStart -
        See Also:
        3.2.3.1. Node Styles
      • fetchFlowSequenceEnd

        private void fetchFlowSequenceEnd()
      • fetchFlowMappingEnd

        private void fetchFlowMappingEnd()
      • fetchFlowCollectionEnd

        private void fetchFlowCollectionEnd​(boolean isMappingEnd)
        Fetch a flow-style collection end, which is either a sequence or a mapping. The type is determined by the given boolean. A flow-style collection is in a format similar to JSON. Sequences are started by '[' and ended by ']'; mappings are started by '{' and ended by '}'.
        See Also:
        3.2.3.1. Node Styles
      • fetchFlowEntry

        private void fetchFlowEntry()
        Fetch an entry in the flow style. Flow-style entries occur either immediately after the start of a collection, or else after a comma.
        See Also:
        3.2.3.1. Node Styles
      • fetchBlockEntry

        private void fetchBlockEntry()
        Fetch an entry in the block style.
        See Also:
        3.2.3.1. Node Styles
      • fetchKey

        private void fetchKey()
        Fetch a key in a block-style mapping.
        See Also:
        3.2.3.1. Node Styles
      • fetchValue

        private void fetchValue()
        Fetch a value in a block-style mapping.
        See Also:
        3.2.3.1. Node Styles
      • fetchAlias

        private void fetchAlias()
        Fetch an alias, which is a reference to an anchor. Aliases take the format:
         *(anchor name)
         
        See Also:
        3.2.2.2. Anchors and Aliases
      • fetchTag

        private void fetchTag()
        Fetch a tag. Tags take a complex form.
        See Also:
        3.2.1.2. Tags
      • fetchLiteral

        private void fetchLiteral()
        Fetch a literal scalar, denoted with a vertical-bar. This is the type best used for source code and other content, such as binary data, which must be included verbatim.
        See Also:
        3.2.3.1. Node Styles
      • fetchFolded

        private void fetchFolded()
        Fetch a folded scalar, denoted with a greater-than sign. This is the type best used for long content, such as the text of a chapter or description.
        See Also:
        3.2.3.1. Node Styles
      • fetchBlockScalar

        private void fetchBlockScalar​(char style)
        Fetch a block scalar (literal or folded).
        Parameters:
        style -
        See Also:
        3.2.3.1. Node Styles
      • fetchSingle

        private void fetchSingle()
        Fetch a single-quoted (') scalar.
      • fetchDouble

        private void fetchDouble()
        Fetch a double-quoted (") scalar.
      • fetchFlowScalar

        private void fetchFlowScalar​(char style)
        Fetch a flow scalar (single- or double-quoted).
        Parameters:
        style -
        See Also:
        3.2.3.1. Node Styles
      • fetchPlain

        private void fetchPlain()
        Fetch a plain scalar.
      • checkDirective

        private boolean checkDirective()
        Returns true if the next thing on the reader is a directive, given that the leading '%' has already been checked.
        See Also:
        3.2.3.4. Directives
      • checkDocumentStart

        private boolean checkDocumentStart()
        Returns true if the next thing on the reader is a document-start ("---"). A document-start is always followed immediately by a new line.
      • checkDocumentEnd

        private boolean checkDocumentEnd()
        Returns true if the next thing on the reader is a document-end ("..."). A document-end is always followed immediately by a new line.
      • checkBlockEntry

        private boolean checkBlockEntry()
        Returns true if the next thing on the reader is a block token.
      • checkKey

        private boolean checkKey()
        Returns true if the next thing on the reader is a key token.
      • checkValue

        private boolean checkValue()
        Returns true if the next thing on the reader is a value token.
      • checkPlain

        private boolean checkPlain()
        Returns true if the next thing on the reader is a plain token.
      • scanToNextToken

        private void scanToNextToken()
         We ignore spaces, line breaks and comments.
         If we find a line break in the block context, we set the flag
         `allow_simple_key` on.
         The byte order mark is stripped if it's the first character in the
         stream. We do not yet support BOM inside the stream as the
         specification requires. Any such mark will be considered as a part
         of the document.
         TODO: We need to make tab handling rules more sane. A good rule is
           Tabs cannot precede tokens
           BLOCK-SEQUENCE-START, BLOCK-MAPPING-START, BLOCK-END,
           KEY(block), VALUE(block), BLOCK-ENTRY
         So the checking code is
           if <TAB>:
               self.allow_simple_keys = False
         We also need to add the check for `allow_simple_keys == True` to
         `unwind_indent` before issuing BLOCK-END.
         Scanners for block, flow, and plain scalars need to be modified.
         
      • scanDirective

        private Token scanDirective()
      • scanDirectiveName

        private java.lang.String scanDirectiveName​(Mark startMark)
        Scan a directive name. Directive names are a series of non-space characters.
        See Also:
        7.1. Directives
      • scanYamlDirectiveValue

        private java.util.List<java.lang.Integer> scanYamlDirectiveValue​(Mark startMark)
      • scanYamlDirectiveNumber

        private java.lang.Integer scanYamlDirectiveNumber​(Mark startMark)
        Read a %YAML directive number: this is either the major or the minor part. Stop reading at a non-digit character (usually either '.' or '\n').
        See Also:
        7.1.1. “YAML” Directive,
      • scanTagDirectiveValue

        private java.util.List<java.lang.String> scanTagDirectiveValue​(Mark startMark)

        Read a %TAG directive value:

         s-ignored-space+ c-tag-handle s-ignored-space+ ns-tag-prefix s-l-comments
         

        See Also:
        7.1.2. “TAG” Directive
      • scanTagDirectiveHandle

        private java.lang.String scanTagDirectiveHandle​(Mark startMark)
        Scan a %TAG directive's handle. This is YAML's c-tag-handle.
        Parameters:
        startMark - - beginning of the handle
        Returns:
        scanned handle
        See Also:
        7.1.2.2. Tag Handles
      • scanTagDirectivePrefix

        private java.lang.String scanTagDirectivePrefix​(Mark startMark)
        Scan a %TAG directive's prefix. This is YAML's ns-tag-prefix.
        See Also:
      • scanDirectiveIgnoredLine

        private void scanDirectiveIgnoredLine​(Mark startMark)
      • scanAnchor

        private Token scanAnchor​(boolean isAnchor)
      • scanTag

        private Token scanTag()

        Scan a Tag property. A Tag property may be specified in one of three ways: c-verbatim-tag, c-ns-shorthand-tag, or c-ns-non-specific-tag

        c-verbatim-tag takes the form !<ns-uri-char+> and must be delivered verbatim (as-is) to the application. In particular, verbatim tags are not subject to tag resolution.

        c-ns-shorthand-tag is a valid tag handle followed by a non-empty suffix. If the tag handle is a c-primary-tag-handle ('!') then the suffix must have all exclamation marks properly URI-escaped (%21); otherwise, the string will look like a named tag handle: !foo!bar would be interpreted as (handle="!foo!", suffix="bar").

        c-ns-non-specific-tag is always a lone '!'; this is only useful for plain scalars, where its specification means that the scalar MUST be resolved to have type tag:yaml.org,2002:str.

        TODO SnakeYaml incorrectly ignores c-ns-non-specific-tag right now.
        See Also:
        8.2. Node Tags TODO Note that this method does not enforce rules about local versus global tags!
      • scanBlockScalar

        private Token scanBlockScalar​(char style)
      • scanBlockScalarIndicators

        private ScannerImpl.Chomping scanBlockScalarIndicators​(Mark startMark)
        Scan a block scalar indicator. The block scalar indicator includes two optional components, which may appear in either order. A block indentation indicator is a non-zero digit describing the indentation level of the block scalar to follow. This indentation is an additional number of spaces relative to the current indentation level. A block chomping indicator is a + or -, selecting the chomping mode away from the default (clip) to either -(strip) or +(keep).
        See Also:
        5.3. Indicator Characters, 9.2.2. Block Indentation Indicator, 9.2.3. Block Chomping Indicator
      • scanBlockScalarIgnoredLine

        private java.lang.String scanBlockScalarIgnoredLine​(Mark startMark)
        Scan to the end of the line after a block scalar has been scanned; the only things that are permitted at this time are comments and spaces.
      • scanBlockScalarIndentation

        private java.lang.Object[] scanBlockScalarIndentation()
        Scans for the indentation of a block scalar implicitly. This mechanism is used only if the block did not explicitly state an indentation to be used.
        See Also:
        9.2.2. Block Indentation Indicator
      • scanBlockScalarBreaks

        private java.lang.Object[] scanBlockScalarBreaks​(int indent)
      • scanFlowScalar

        private Token scanFlowScalar​(char style)
        Scan a flow-style scalar. Flow scalars are presented in one of two forms; first, a flow scalar may be a double-quoted string; second, a flow scalar may be a single-quoted string.
        See Also:
        9.1. Flow Scalar Styles style/syntax
         See the specification for details.
         Note that we loose indentation rules for quoted scalars. Quoted
         scalars don't need to adhere indentation because " and ' clearly
         mark the beginning and the end of them. Therefore we are less
         restrictive then the specification requires. We only need to check
         that document separators are not included in scalars.
         
      • scanFlowScalarNonSpaces

        private java.lang.String scanFlowScalarNonSpaces​(boolean doubleQuoted,
                                                         Mark startMark)
        Scan some number of flow-scalar non-space characters.
      • scanFlowScalarSpaces

        private java.lang.String scanFlowScalarSpaces​(Mark startMark)
      • scanFlowScalarBreaks

        private java.lang.String scanFlowScalarBreaks​(Mark startMark)
      • scanPlain

        private Token scanPlain()
        Scan a plain scalar.
         See the specification for details.
         We add an additional restriction for the flow context:
           plain scalars in the flow context cannot contain ',', ':' and '?'.
         We also keep track of the `allow_simple_key` flag here.
         Indentation rules are loosed for the flow context.
         
      • scanPlainSpaces

        private java.lang.String scanPlainSpaces()
        See the specification for details. SnakeYAML and libyaml allow tabs inside plain scalar
      • scanTagHandle

        private java.lang.String scanTagHandle​(java.lang.String name,
                                               Mark startMark)

        Scan a Tag handle. A Tag handle takes one of three forms:

         "!" (c-primary-tag-handle)
         "!!" (ns-secondary-tag-handle)
         "!(name)!" (c-named-tag-handle)
         
        Where (name) must be formatted as an ns-word-char.

        See Also:
        ,
         See the specification for details.
         For some strange reasons, the specification does not allow '_' in
         tag handles. I have allowed it anyway.
         
      • scanTagUri

        private java.lang.String scanTagUri​(java.lang.String name,
                                            Mark startMark)

        Scan a Tag URI. This scanning is valid for both local and global tag directives, because both appear to be valid URIs as far as scanning is concerned. The difference may be distinguished later, in parsing. This method will scan for ns-uri-char*, which covers both cases.

        This method performs no verification that the scanned URI conforms to any particular kind of URI specification.

        See Also:
      • scanUriEscapes

        private java.lang.String scanUriEscapes​(java.lang.String name,
                                                Mark startMark)

        Scan a sequence of %-escaped URI escape codes and convert them into a String representing the unescaped values.

        FIXME This method fails for more than 256 bytes' worth of URI-encoded characters in a row. Is this possible? Is this a use-case?
        See Also:
        section 2.4, Escaped Encoding
      • scanLineBreak

        private java.lang.String scanLineBreak()
        Scan a line break, transforming:
         '\r\n' : '\n'
         '\r' : '\n'
         '\n' : '\n'
         '\x85' : '\n'
         default : ''