How to handle syntax coloring with similar start & end strings

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to handle syntax coloring with similar start & end strings

Dave Roberts
To the syntax guru's,

I've come up with a syntax file that colors a modified subset of VST
text. I'm only showing the bold and italic stuff here because that's
where the question is but the full (much longer) version handles all
combinations of bold, italic, and underlined within each other.

I can only figure out how to do it, however, if the starts and ends are
separated (as shown in the following examples)

*This is italic only text*
**This is bold only text**
***This is bold/italic text***
*This is **bold text** within italic text*
**This is *italic text* within bold text**

This snippet handles the above:
------------------
syn region vstBold                  start='\%(^\|[^*]\)\zs\*\{2}[^* \t]'
end='[^* \t]\@<=\*\{2}\ze\%($\|[^*]\)' contains=vstBold_Italic
syn region vstBold_Italic contained start='\%(^\|[^*]\)\zs\*\{1}[^* \t]'
end='[^* \t]\@<=\*\{1}\ze\%($\|[^*]\)'

syn region vstItalic                start='\%(^\|[^*]\)\zs\*\{1}[^* \t]'
end='[^* \t]\@<=\*\{1}\ze\%($\|[^*]\)' contains=vstItalic_Bold
syn region vstItalic_Bold contained start='\%(^\|[^*]\)\zs\*\{2}[^* \t]'
end='[^* \t]\@<=\*\{2}\ze\%($\|[^*]\)'

syn region vstBoldItalic            start='\%(^\|[^*]\)\zs\*\{3}[^* \t]'
end='[^* \t]\@<=\*\{3}\ze\%($\|[^*]\)'

hi link   vstBold_Italic vstBoldItalic
hi link   vstItalic_Bold vstBoldItalic

hi def    vstBold        term=bold        cterm=bold        gui=bold
hi def    vstBoldItalic  term=bold,italic cterm=bold,italic gui=bold,italic
hi def    vstItalic      term=italic      cterm=italic      gui=italic
------------------

My problem is how to handle the following:

***This starts as bold/italic** then italic only*
***This starts as bold/italic* then bold only**
*This starts italic **Then goes bold/italic***
**This starts bold *Then goes bold/italic***

If I don't explicitly disallow '*' before or after the '*' or '**' then
the following:

**This is supposed to be bold only**

becomes bold/italic because it matches both *something*, and **something**

So... How do you do this?

Thanks,

- Dave
Reply | Threaded
Open this post in threaded view
|

Re: How to handle syntax coloring with similar start & end strings

Dave Roberts
Yakov Lerner wrote:

> On 3/26/06, Dave Roberts <[hidden email]> wrote:
>  
>> *This is italic only text*
>> **This is bold only text**
>> ***This is bold/italic text***
>> *This is **bold text** within italic text*
>> **This is *italic text* within bold text**
>> ...
>> ***This starts as bold/italic** then italic only*
>> ***This starts as bold/italic* then bold only**
>> *This starts italic **Then goes bold/italic***
>> **This starts bold *Then goes bold/italic***
>>    
>
> I'm not a syntax guru, but I think 'syn regions' won't work here.
>
> How about using only syn matches, using 17 separate 'syn matches'
> for the 9 cases given above, + using nextgroup + using \_ .
>
> I mean defining separate 'syn maych' for each part bounded by
> asterisks and not containing asterists inside.
>
> For example, case #1:
> syn match italic /\(^\|[^*]\zs\*\($\|[^*]\)\_[^*]*\ze*\($\|[^*]\)/
>
> I think using [^*]* inside syn matches is all you need for resolving this,
> + nextgroup.
>
> Yakov
>
>  

So I want to keep the region's I have (since they work correctly and
don't match any of the "problem" patterns) and add two matches for
patterns like:

***This starts as bold/italic** then italic only*

1) "***This is bold/italic" only if followed by "\S**\ssomething*"
2) "\S**\sThis is italic*" only if preceded by "***\S"

That's what I was originally thinking but it's going to be tricky
because these are multi-line patterns. I can't just match a preceding
"***" without checking that it's not already terminated by a "***". In
other words the two above need to be:

1) "***This is bold/italic" only if followed by "\S**\ssomething*" and
there is NOT an unmatched "\*\{1,3}\S" or "\S\*\{1,3}" within the match
2) (same restriction)

Hmm. Actually I have to change the current code to disallow embedded
unmatched "\*\{1,3}\S" or "\S\*\{1,3}" because the way it is now, the
following would be matched entirely as italic/bold:

***This starts as bold/italic* then bold only**
*This starts italic **Then goes bold/italic***

because it starts with "***\S" and ends with "\S***".

Ugh! This would be bad enough as is but the real script also handles
embedded underscores with another character...

OK - It's a bigger job than I originally thought but I have a handle on
it now.

(since sending this originally I've worked on it and I'm now less
convinced there's a way to do it since multiple regular expressions
match the same starting point and, when that happens only the last takes
precedence which will sometimes be right and sometimes be wrong...)

Thanks,

- Dave