vim7: spell and non-ascii letters (word border problems?)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

vim7: spell and non-ascii letters (word border problems?)

Mikołaj Machowski
Hello,

Today CVS.

(post should be in iso-8859-2)

This is similar to one problem from the past - considering non-ascii
letters as word borders.

Settings:

set spell
set spelllang=pl (the same behaviour with pl,en)
set encoding=iso-8859-2

Freshly recreated spell file with attached patch from sources

Word:

d?ug  <- second letter is l with slash, 179 (hex b3) in iso-8859-2 enc.

When typing

d?

d is highlighted as bad word

d?u

d is still highlighted as bad word

d?ug

highlighting vanishes because whole "d?ug" is valid word (debt).

Looks like space is more important as word border than non-ascii letter.
(OTOH words are properly added to the 'spellfile')
This situation disturbs writing very, very much. I gave only simplest
example I could find. Behaviour can be observed also when typing words
like: grz?da (here "grz" is highlighted up to typing of d when "grz?d"
becomes valid word (perches)), cz??, j?zyk, powy?sze, etc. There are
really many situations when user will be alarmed with highlighting in
completely legit situations.

Also when bad word contain non-ascii letter only part up to that letter
(or after that letter) is highlighted:

?niadecki, sigl?w
 ^^^^^^^^  ^^^^
Parts underscored with ^ are highlighted as bad.

m.

--
LaTeX + Vim = http://vim-latex.sourceforge.net/
Vim-list(s) Users Map: (last change 15 May)
 http://skawina.eu.org/mikolaj/vimlist
CLEWN - http://clewn.sf.net


Reply | Threaded
Open this post in threaded view
|

Re: vim7: spell and non-ascii letters (word border problems?)

Bram Moolenaar

Mikolaj Machowski wrote:

> Today CVS.
>
> (post should be in iso-8859-2)
>
> This is similar to one problem from the past - considering non-ascii
> letters as word borders.
>
> Settings:
>
> set spell
> set spelllang=pl (the same behaviour with pl,en)
> set encoding=iso-8859-2
>
> Freshly recreated spell file with attached patch from sources
>
> Word:
>
> d?ug  <- second letter is l with slash, 179 (hex b3) in iso-8859-2 enc.
>
> When typing
>
> d?
>
> d is highlighted as bad word
>
> d?u
>
> d is still highlighted as bad word
>
> d?ug
>
> highlighting vanishes because whole "d?ug" is valid word (debt).
>
> Looks like space is more important as word border than non-ascii letter.
> (OTOH words are properly added to the 'spellfile')
> This situation disturbs writing very, very much. I gave only simplest
> example I could find. Behaviour can be observed also when typing words
> like: grz?da (here "grz" is highlighted up to typing of d when "grz?d"
> becomes valid word (perches)), cz??, j?zyk, powy?sze, etc. There are
> really many situations when user will be alarmed with highlighting in
> completely legit situations.
>
> Also when bad word contain non-ascii letter only part up to that letter
> (or after that letter) is highlighted:
>
> ?niadecki, sigl?w
>  ^^^^^^^^  ^^^^
> Parts underscored with ^ are highlighted as bad.

There was a problem in reading the list with word characters from the
.spl file.  This patch should fix it:

Index: spell.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/spell.c,v
retrieving revision 1.34
diff -u -r1.34 spell.c
--- spell.c 3 Jul 2005 21:25:22 -0000 1.34
+++ spell.c 4 Jul 2005 09:01:49 -0000
@@ -591,7 +591,7 @@
 static void int_wordlist_spl __ARGS((char_u *fname));
 static void spell_load_cb __ARGS((char_u *fname, void *cookie));
 static slang_T *spell_load_file __ARGS((char_u *fname, char_u *lang, slang_T *old_lp, int silent));
-static char_u *read_cnt_string __ARGS((FILE *fd, int cnt_bytes, int *errp));
+static char_u *read_cnt_string __ARGS((FILE *fd, int cnt_bytes, int *lenp));
 static int set_sofo __ARGS((slang_T *lp, char_u *from, char_u *to));
 static void set_sal_first __ARGS((slang_T *lp));
 #ifdef FEAT_MBYTE
@@ -603,7 +603,7 @@
 static int find_region __ARGS((char_u *rp, char_u *region));
 static int captype __ARGS((char_u *word, char_u *end));
 static void spell_reload_one __ARGS((char_u *fname, int added_word));
-static int set_spell_charflags __ARGS((char_u *flags, char_u *upp));
+static int set_spell_charflags __ARGS((char_u *flags, int cnt, char_u *upp));
 static int set_spell_chartab __ARGS((char_u *fol, char_u *low, char_u *upp));
 static void write_spell_chartab __ARGS((FILE *fd));
 static int spell_casefold __ARGS((char_u *p, int len, char_u *buf, int buflen));
@@ -1837,12 +1837,12 @@
 
     /* <charflagslen> <charflags> */
     p = read_cnt_string(fd, 1, &cnt);
-    if (cnt == FAIL)
+    if (cnt < 0)
  goto endFAIL;
 
     /* <fcharslen> <fchars> */
-    fol = read_cnt_string(fd, 2, &cnt);
-    if (cnt == FAIL)
+    fol = read_cnt_string(fd, 2, &ccnt);
+    if (ccnt < 0)
     {
  vim_free(p);
  goto endFAIL;
@@ -1850,7 +1850,7 @@
 
     /* Set the word-char flags and fill SPELL_ISUPPER() table. */
     if (p != NULL && fol != NULL)
- i = set_spell_charflags(p, fol);
+ i = set_spell_charflags(p, cnt, fol);
 
     vim_free(p);
     vim_free(fol);
@@ -1861,7 +1861,7 @@
 
     /* <midwordlen> <midword> */
     lp->sl_midword = read_cnt_string(fd, 2, &cnt);
-    if (cnt == FAIL)
+    if (cnt < 0)
  goto endFAIL;
 
     /* <prefcondcnt> <prefcond> ... */
@@ -1912,10 +1912,10 @@
     {
  ftp = &((fromto_T *)gap->ga_data)[gap->ga_len];
  ftp->ft_from = read_cnt_string(fd, 1, &i);
- if (i == FAIL)
+ if (i <= 0)
     goto endFAIL;
  ftp->ft_to = read_cnt_string(fd, 1, &i);
- if (i == FAIL)
+ if (i <= 0)
  {
     vim_free(ftp->ft_from);
     goto endFAIL;
@@ -1957,19 +1957,24 @@
 
  /* <salfromlen> <salfrom> */
  bp = read_cnt_string(fd, 2, &cnt);
- if (cnt == FAIL)
+ if (cnt < 0)
     goto endFAIL;
 
  /* <saltolen> <salto> */
  fol = read_cnt_string(fd, 2, &cnt);
- if (cnt == FAIL)
+ if (cnt < 0)
  {
     vim_free(bp);
     goto endFAIL;
  }
 
  /* Store the info in lp->sl_sal and/or lp->sl_sal_first. */
- i = set_sofo(lp, bp, fol);
+ if (bp != NULL && fol != NULL)
+    i = set_sofo(lp, bp, fol);
+ else if (bp != NULL || fol != NULL)
+    i = FAIL;    /* only one of two strings is an error */
+ else
+    i = OK;
 
  vim_free(bp);
  vim_free(fol);
@@ -2036,7 +2041,7 @@
 
     /* <saltolen> <salto> */
     smp->sm_to = read_cnt_string(fd, 1, &ccnt);
-    if (ccnt == FAIL)
+    if (ccnt < 0)
     {
  vim_free(smp->sm_lead);
  goto formerr;
@@ -2052,10 +2057,13 @@
     smp->sm_oneof_w = NULL;
  else
     smp->sm_oneof_w = mb_str2wide(smp->sm_oneof);
- smp->sm_to_w = mb_str2wide(smp->sm_to);
+ if (smp->sm_to == NULL)
+    smp->sm_to_w = NULL;
+ else
+    smp->sm_to_w = mb_str2wide(smp->sm_to);
  if (smp->sm_lead_w == NULL
  || (smp->sm_oneof_w == NULL && smp->sm_oneof != NULL)
- || smp->sm_to_w == NULL)
+ || (smp->sm_to_w == NULL && smp->sm_to != NULL))
  {
     vim_free(smp->sm_lead);
     vim_free(smp->sm_to);
@@ -2074,11 +2082,13 @@
 
     /* <maplen> <mapstr> */
     p = read_cnt_string(fd, 2, &cnt);
-    if (cnt == FAIL)
+    if (cnt < 0)
  goto endFAIL;
-    set_map_str(lp, p);
-    vim_free(p);
-
+    if (p != NULL)
+    {
+ set_map_str(lp, p);
+ vim_free(p);
+    }
 
     /* round 1: <LWORDTREE>
      * round 2: <KWORDTREE>
@@ -2155,13 +2165,13 @@
  * Read a length field from "fd" in "cnt_bytes" bytes.
  * Allocate memory, read the string into it and add a NUL at the end.
  * Returns NULL when the count is zero.
- * Sets "*errp" to FAIL when there is an error, OK otherwise.
+ * Sets "*cntp" to -1 when there is an error, length of the result otherwise.
  */
     static char_u *
-read_cnt_string(fd, cnt_bytes, errp)
+read_cnt_string(fd, cnt_bytes, cntp)
     FILE *fd;
     int cnt_bytes;
-    int *errp;
+    int *cntp;
 {
     int cnt = 0;
     int i;
@@ -2173,18 +2183,20 @@
     if (cnt < 0)
     {
  EMSG(_(e_spell_trunc));
- *errp = FAIL;
+ *cntp = -1;
  return NULL;
     }
+    *cntp = cnt;
+    if (cnt == 0)
+ return NULL;    /* nothing to read, return NULL */
 
     /* allocate memory */
     str = alloc((unsigned)cnt + 1);
     if (str == NULL)
     {
- *errp = FAIL;
+ *cntp = -1;
  return NULL;
     }
-    *errp = OK;
 
     /* Read the string.  Doesn't check for truncated file. */
     for (i = 0; i < cnt; ++i)
@@ -2697,6 +2709,9 @@
 {
     char_u *p;
 
+    if (lp->sl_midword == NULL)    /* there aren't any */
+ return;
+
     for (p = lp->sl_midword; *p != NUL; )
 #ifdef FEAT_MBYTE
  if (has_mbyte)
@@ -5604,34 +5619,39 @@
  * Set the spell character tables from strings in the .spl file.
  */
     static int
-set_spell_charflags(flags, upp)
+set_spell_charflags(flags, cnt, fol)
     char_u *flags;
-    char_u *upp;
+    int cnt;    /* length of "flags" */
+    char_u *fol;
 {
     /* We build the new tables here first, so that we can compare with the
      * previous one. */
     spelltab_T new_st;
     int i;
-    char_u *p = upp;
+    char_u *p = fol;
     int c;
 
     clear_spell_chartab(&new_st);
 
-    for (i = 0; flags[i] != NUL; ++i)
+    for (i = 0; i < 128; ++i)
     {
- new_st.st_isw[i + 128] = (flags[i] & CF_WORD) != 0;
- new_st.st_isu[i + 128] = (flags[i] & CF_UPPER) != 0;
+ if (i < cnt)
+ {
+    new_st.st_isw[i + 128] = (flags[i] & CF_WORD) != 0;
+    new_st.st_isu[i + 128] = (flags[i] & CF_UPPER) != 0;
+ }
 
- if (*p == NUL)
-    return FAIL;
+ if (*p != NUL)
+ {
 #ifdef FEAT_MBYTE
- c = mb_ptr2char_adv(&p);
+    c = mb_ptr2char_adv(&p);
 #else
- c = *p++;
+    c = *p++;
 #endif
- new_st.st_fold[i + 128] = c;
- if (i + 128 != c && new_st.st_isu[i + 128] && c < 256)
-    new_st.st_upper[c] = i + 128;
+    new_st.st_fold[i + 128] = c;
+    if (i + 128 != c && new_st.st_isu[i + 128] && c < 256)
+ new_st.st_upper[c] = i + 128;
+ }
     }
 
     return set_spell_finish(&new_st);
@@ -8836,6 +8856,8 @@
 
     /* replace string */
     s = smp[n].sm_to;
+    if (s == NULL)
+ s = (char_u *)"";
     pf = smp[n].sm_rules;
     p0 = (vim_strchr(pf, '<') != NULL) ? 1 : 0;
     if (p0 == 1 && z == 0)
@@ -9138,18 +9160,20 @@
     if (p0 == 1 && z == 0)
     {
  /* rule with '<' is used */
- if (reslen > 0 && *ws != NUL && (wres[reslen - 1] == c
+ if (reslen > 0 && ws != NULL && *ws != NUL
+ && (wres[reslen - 1] == c
     || wres[reslen - 1] == *ws))
     reslen--;
  z0 = 1;
  z = 1;
  k0 = 0;
- while (*ws != NUL && word[i + k0] != NUL)
- {
-    word[i + k0] = *ws;
-    k0++;
-    ws++;
- }
+ if (ws != NULL)
+    while (*ws != NUL && word[i + k0] != NUL)
+    {
+ word[i + k0] = *ws;
+ k0++;
+ ws++;
+    }
  if (k > k0)
     mch_memmove(word + i + k0, word + i + k,
     sizeof(int) * (STRLEN(word + i + k) + 1));
@@ -9162,14 +9186,19 @@
  /* no '<' rule used */
  i += k - 1;
  z = 0;
- while (*ws != NUL && ws[1] != NUL && reslen < MAXWLEN)
- {
-    if (reslen == 0 || wres[reslen - 1] != *ws)
- wres[reslen++] = *ws;
-    ws++;
- }
+ if (ws != NULL)
+    while (*ws != NUL && ws[1] != NUL
+  && reslen < MAXWLEN)
+    {
+ if (reslen == 0 || wres[reslen - 1] != *ws)
+    wres[reslen++] = *ws;
+ ws++;
+    }
  /* new "actual letter" */
- c = *ws;
+ if (ws == NULL)
+    c = NUL;
+ else
+    c = *ws;
  if (strstr((char *)s, "^^") != NULL)
  {
     if (c != NUL)


--
hundred-and-one symptoms of being an internet addict:
220. Your wife asks for sex and you tell her where to find you on IRC.

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
 \\\     Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html   ///
Reply | Threaded
Open this post in threaded view
|

Re: vim7: spell and non-ascii letters (word border problems?)

Mikołaj Machowski
Dnia poniedzia?ek 04 lipiec 2005 12:27, Bram Moolenaar napisa?:
>
> There was a problem in reading the list with word characters from the
> .spl file.  This patch should fix it:
>
Patch malformed at line 41.

Or when saving message in other way:

patch v. 2.5.9

mikolaj@localhost ~/vim7/src $ patch -p0 --dry-run < dlug2
patching file spell.c
Hunk #3 FAILED at 1837.
Hunk #4 FAILED at 1850.
Hunk #5 FAILED at 1861.
Hunk #6 FAILED at 1912.
Hunk #7 FAILED at 1957.
Hunk #8 FAILED at 2041.
Hunk #9 FAILED at 2057.
Hunk #10 FAILED at 2082.
Hunk #11 FAILED at 2165.
Hunk #12 FAILED at 2183.
Hunk #13 succeeded at 2709 with fuzz 2.
Hunk #14 FAILED at 5619.
Hunk #15 FAILED at 8856.
Hunk #16 FAILED at 9160.
Hunk #17 FAILED at 9186.
14 out of 17 hunks FAILED -- saving rejects to file spell.c.rej

When merged changes by hand got message about unknown ws (note however
I could make some mistake, this is big patch).

Looks like you send me patch to not published version of spell.c

m.




Reply | Threaded
Open this post in threaded view
|

Re: vim7: spell and non-ascii letters (word border problems?)

Bram Moolenaar

Mikolaj Machowski wrote:

> Dnia poniedzia?ek 04 lipiec 2005 12:27, Bram Moolenaar napisa?:
> >
> > There was a problem in reading the list with word characters from the
> > .spl file.  This patch should fix it:
> >
> Patch malformed at line 41.
>
> Or when saving message in other way:
>
> patch v. 2.5.9
>
> mikolaj@localhost ~/vim7/src $ patch -p0 --dry-run < dlug2
> patching file spell.c
> Hunk #3 FAILED at 1837.
> Hunk #4 FAILED at 1850.
> Hunk #5 FAILED at 1861.
> Hunk #6 FAILED at 1912.
> Hunk #7 FAILED at 1957.
> Hunk #8 FAILED at 2041.
> Hunk #9 FAILED at 2057.
> Hunk #10 FAILED at 2082.
> Hunk #11 FAILED at 2165.
> Hunk #12 FAILED at 2183.
> Hunk #13 succeeded at 2709 with fuzz 2.
> Hunk #14 FAILED at 5619.
> Hunk #15 FAILED at 8856.
> Hunk #16 FAILED at 9160.
> Hunk #17 FAILED at 9186.
> 14 out of 17 hunks FAILED -- saving rejects to file spell.c.rej
>
> When merged changes by hand got message about unknown ws (note however
> I could make some mistake, this is big patch).
>
> Looks like you send me patch to not published version of spell.c

The patch was against the spell.c in CVS.  It probably got mangled by
the mail system somewhere (e.g, changing tabs to spaces).  Anyway, it
will be in the next snapshot.

--
A computer without Windows is like a fish without a bicycle.

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
 \\\     Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html   ///
Reply | Threaded
Open this post in threaded view
|

Re: vim7: spell and non-ascii letters (word border problems?)

Mikołaj Machowski
Dnia poniedzia?ek 04 lipiec 2005 21:43, Bram Moolenaar napisa?:
>
> The patch was against the spell.c in CVS.  It probably got mangled by
> the mail system somewhere (e.g, changing tabs to spaces).  Anyway, it
> will be in the next snapshot.

Aah. patch -l and everything works (patching and patch itself).

m.