How do I search for badly encoded characters

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How do I search for badly encoded characters

James Barnett
Dear Forum,

I maintain an application that reads and writes text in utf-8. Due to bad joss incurred during my (and others) utf-8 learning curve I now have some garbled characters in my input. These show up in vim -b as '<nn>', where nn is a lower-case hex string. Here's an example:
    3020    tuomas jorma juhani r<e4>s<e4>nen

My question is, how do I search for these characters in vim so I can fix or delete them? Treating them as literal strings doesn't work.

Thanks!

   

--
--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_multibyte" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: How do I search for badly encoded characters

JohnBeckett
James Barnett wrote:

> I maintain an application that reads and writes text in utf-8.
> Due to bad joss incurred during my (and others) utf-8 learning
> curve I now have some garbled characters in my input. These
> show up in vim -b as '<nn>', where nn is a lower-case hex
> string. Here's an example:
> 3020    tuomas jorma juhani r<e4>s<e4>nen
>
> My question is, how do I search for these characters in vim so
> I can fix or delete them? Treating them as literal strings
> doesn't work.

In principle, vim_multibyte is the right mailing list, but in
practice it is hardly every used, and I suggest using the main
vim_use mailing list in the future unless a very esoteric issue
regarding multibyte issues needs to be discussed at length.

There are three very useful commands entered in normal mode:
    ga
    g8
    8g8

ga and g8 display information about the character at the cursor.
8g8 finds the next illegal UTF-8 sequences (it does nothing if
none found).

Use ':help 8g8' for info.

John

--
--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_multibyte" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How do I search for badly encoded characters

krbeesley

On 27Oct2014, at 17:20, John Beckett <[hidden email]> wrote:

> James Barnett wrote:
>> I maintain an application that reads and writes text in utf-8.
>> Due to bad joss incurred during my (and others) utf-8 learning
>> curve I now have some garbled characters in my input. These
>> show up in vim -b as '<nn>', where nn is a lower-case hex
>> string.

>
> There are three very useful commands entered in normal mode:
>    ga
>    g8
>    8g8
>
> ga and g8 display information about the character at the cursor.
> 8g8 finds the next illegal UTF-8 sequences (it does nothing if
> none found).
>
> Use ':help 8g8' for info.
>

I assume that your ‘encoding’ (vim buffer internal encoding) is UTF-8.

Once you know the hex value that you want to find, e.g 00E4,
I think that you should be able to search for it by entering /  (the slash),
Ctrl-v, u, 00E4.

********************************
Kenneth R. Beesley, D.Phil.
P.O. Box 540475
North Salt Lake, UT
84054  USA





--
--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_multibyte" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How do I search for badly encoded characters

Nikolay Aleksandrovich Pavlov
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On October 28, 2014 7:08:27 PM EAT, Kenneth Reid Beesley <[hidden email]> wrote:

>
>On 27Oct2014, at 17:20, John Beckett <[hidden email]> wrote:
>
>> James Barnett wrote:
>>> I maintain an application that reads and writes text in utf-8.
>>> Due to bad joss incurred during my (and others) utf-8 learning
>>> curve I now have some garbled characters in my input. These
>>> show up in vim -b as '<nn>', where nn is a lower-case hex
>>> string.
>
>>
>> There are three very useful commands entered in normal mode:
>>    ga
>>    g8
>>    8g8
>>
>> ga and g8 display information about the character at the cursor.
>> 8g8 finds the next illegal UTF-8 sequences (it does nothing if
>> none found).
>>
>> Use ':help 8g8' for info.
>>
>
>I assume that your ‘encoding’ (vim buffer internal encoding) is UTF-8.
>
>Once you know the hex value that you want to find, e.g 00E4,
>I think that you should be able to search for it by entering /  (the
>slash),
>Ctrl-v, u, 00E4.

This only allows you to search for unicode characters. They never show up as <xx> AFAIK. To enter invalid character one needs to use <C-r>="\xXX"<CR>.

>
>********************************
>Kenneth R. Beesley, D.Phil.
>P.O. Box 540475
>North Salt Lake, UT
>84054  USA

-----BEGIN PGP SIGNATURE-----
Version: APG v1.1.1

iQI1BAEBCgAfBQJUT8PZGBxaeVggPHp5eC52aW1AZ21haWwuY29tPgAKCRCf3UKj
HhHSvutZEACqiQyQd8mJZKDxM1s4hkLcFhtTqX5WC+euSBB37pOsK8w/X5qjPxjS
Z7Em9swlg777/ngBr3Lu0vWWBgYuoYp2Ad7/YE4HAzaT3NhUwWx3nhNGQbcaO9AN
6h9eAqVhtOki0/g3/kQT2cN2Md1kzcYYYRNGs6jRxeNW2+O/mMXbLXkDls2N46mK
WIIaklb+4El2zCT7+PXxDC+vLGpDEdktbHzOnAldfjpOxM1Apu5mqkp6weDHhWaU
iLKUaVhRDW2CFJAXyVKsr3q/ei5EPx3Xcrd1xn6BZcYy0fRbVYLBYLbGbtSVV5tw
PAvhsKL4xnVaBKK9n7d2KgdOqaSOkUprmh8Y13kMUE/oyuT+1SvnNnX9I4eUCIOg
evgrY5qi++zM/MsuuNYK16VJgicpxo8TD+QqKjyr+yPfS806AMTnnzoD0/lqsE4Q
iIQjSg1bj+Z7s4jC9cSbRBQl7jUrCw5XhSjnmCwdIRl5tErD+yRHWPAw+2EML+Xi
N28gxtR3gKaPBD4D40XFE9XNYCC48yjBcqupd5w8nJD4pURPMhQ8gIhbSQeh6ezA
q1V/E/0IAL31jn5DgYpsHl5pGAzuFumjCnibsHnISk2x9Q8pktSiP/T9Gsomv6AP
A+Fu8hqsqnUwKgKwvMW96mhQV4PfO31K7+fMNc0q07qxQP6DHi/4eQ==
=CkpT
-----END PGP SIGNATURE-----

--
--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_multibyte" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Loading...