How to check if I have lines with different length in a database extract

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

How to check if I have lines with different length in a database extract

sanjeev.g.sapre
List,

I have some data extraction program which creates comma separated flat
files.

Each type of file has some fixed length of a line. Before we can pass on
this file for further processing I would like to check that all lines are
of equal length. Is there a simple way /pattern by which I can identify
lines with differing  length.

Thanks in advance.

Regards
Sanjeev
Holset, Huddersfield
Direct:  +44 - 01484 440 365


Reply | Threaded
Open this post in threaded view
|

Re: How to check if I have lines with different length in a database extract

Tim Chase-2
> Each type of file has some fixed length of a line. Before we can pass
> on this file for further processing I would like to check that all
> lines are of equal length. Is there a simple way /pattern by which I
> can identify lines with differing  length.

Well, a couple ideas stand out to me.  Depending on your file size (in
lines), it could be as simple as

        :set list

and then scrolling down, watching the right margin to see if any of the
"$" characters dance out of position.

If, however, you've got a large file (or more lines than you reasonably
care to scroll through), you can use something like

        :v/\%40c.$/#

which will return a list of each of the lines that *don't* have 40
characters in them, along with their line numbers.  Your desire would be
to get back the "error"

        Pattern found on every line

However, if there are lines that don't have 40 characters, it will
return them along with their line number.  If you want to make changes
to each line, just type the line number followed by "G" and you'll jump
to the line in question.

If you simply want to filter these errant lines out of your file by
deleting them completely, you can simply change the "#" to a "d" in the
above command, such as

        :v/\%40c.$/d

and it will delete any of the problematic lines.

If order doesn't matter, you can take a pre-processing pass and move
them all to the bottom of the file with

        :v/\%40c.$/m$

where you can edit them all or deal with them accordingly.

Hope this gives you something to work with.

-tim





Reply | Threaded
Open this post in threaded view
|

Re: How to check if I have lines with different length in a database extract

Jürgen Krämer
In reply to this post by sanjeev.g.sapre

Hi,

[hidden email] wrote:
>
> I have some data extraction program which creates comma separated flat
> files.
>
> Each type of file has some fixed length of a line. Before we can pass on
> this file for further processing I would like to check that all lines are
> of equal length. Is there a simple way /pattern by which I can identify
> lines with differing  length.

the following mapping will search for the next line that has a different
length than the current line:

  nnoremap \d /^\(.\{0,<c-r>=strlen(getline('.'))-1<cr>\}\\|.\{<c-r>=strlen(getline('.'))+1<cr>,\}\)$<cr>

Regards,
J?rgen

--
J?rgen Kr?mer                              Softwareentwicklung
HABEL GmbH & Co. KG                        mailto:[hidden email]
Hinteres ?schle 2                          Tel: +49 / 74 61 / 93 53 - 15
78604 Rietheim-Weilheim                    Fax: +49 / 74 61 / 93 53 - 99
Reply | Threaded
Open this post in threaded view
|

Re: How to check if I have lines with different length in a database extract

John (Eljay) Love-Jensen
In reply to this post by sanjeev.g.sapre
Hi Sanjeev,

This is what I did...

:%s/././g
:%!sort | uniq -c

Maybe that would work for you.

Note:  somewhat destructive.  Save the file before doing this, and restore
it afterwards.

HTH,
--Eljay

Reply | Threaded
Open this post in threaded view
|

Re: How to check if I have lines with different length in a database extract

sanjeev.g.sapre
In reply to this post by sanjeev.g.sapre
Thanks Tim..

That was very useful.
That's what exactly I was looking for.


Regards
Sanjeev
Holset, Huddersfield
Direct:  +44 - 01484 440 365




Tim Chase <[hidden email]>
02/11/2005 12:23

 
        To:     [hidden email]
        cc:     [hidden email]
        Subject:        Re: How to check if I have lines with different length in a database
extract


> Each type of file has some fixed length of a line. Before we can pass
> on this file for further processing I would like to check that all
> lines are of equal length. Is there a simple way /pattern by which I
> can identify lines with differing  length.

Well, a couple ideas stand out to me.  Depending on your file size (in
lines), it could be as simple as

                 :set list

and then scrolling down, watching the right margin to see if any of the
"$" characters dance out of position.

If, however, you've got a large file (or more lines than you reasonably
care to scroll through), you can use something like

                 :v/\%40c.$/#

which will return a list of each of the lines that *don't* have 40
characters in them, along with their line numbers.  Your desire would be
to get back the "error"

                 Pattern found on every line

However, if there are lines that don't have 40 characters, it will
return them along with their line number.  If you want to make changes
to each line, just type the line number followed by "G" and you'll jump
to the line in question.

If you simply want to filter these errant lines out of your file by
deleting them completely, you can simply change the "#" to a "d" in the
above command, such as

                 :v/\%40c.$/d

and it will delete any of the problematic lines.

If order doesn't matter, you can take a pre-processing pass and move
them all to the bottom of the file with

                 :v/\%40c.$/m$

where you can edit them all or deal with them accordingly.

Hope this gives you something to work with.

-tim






______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________



Reply | Threaded
Open this post in threaded view
|

Re: How to check if I have lines with different length in a database extract

Bertilo Wennergren
In reply to this post by Tim Chase-2
On 11/2/05, Tim Chase <[hidden email]> wrote:

> If, however, you've got a large file (or more lines than you
> reasonably care to scroll through), you can use something like

>         :v/\%40c.$/#
>
> which will return a list of each of the lines that *don't* have 40
> characters in them,

That actually seems to count bytes, not characters. I tried it using
UTF-8, and my two-byte characters counted as two, at least sometimes.
The results were not consistent!

E.g.:

  oooo
  oooö
  oooo

  :v/\%4c.$/#
  Pattern found in every line: \%4c.$

But:

  oooo
  ooöo
  oooo

  :v/\%4c.$/#
        2 ooöo

Have I stumbled on a bug? This was in Vim 6.4 in Linux (Kubuntu).

--
Bertilo Wennergren <http://bertilow.com>
Reply | Threaded
Open this post in threaded view
|

Re: How to check if I have lines with different length in a database extract

Tim Chase-2
 > That actually seems to count bytes, not characters. I
 > tried it using UTF-8, and my two-byte characters counted
 > as two, at least sometimes.  The results were not
 > consistent!
 >
 > E.g.:
 >
 >   oooo
 >   ooo?
 >   oooo
 >
 >   :v/\%4c.$/#
 >   Pattern found in every line: \%4c.$
 >
 > But:
 >
 >   oooo
 >   oo?o
 >   oooo
 >
 >   :v/\%4c.$/#
 >         2 oo?o
 >
 > Have I stumbled on a bug? This was in Vim 6.4 in Linux
 > (Kubuntu).

How strange.  Bug?  perhaps, or fixable with some
option-obscura.  I can't be of much help here as I don't use
UTF-8 or multi-byte character sets/encodings for
anything...other than occasionally trying out some of the
crazy-good ideas by folks like Tony on the list who are much
more well-versed in the ins and outs of this dark corner of
Vim.

-tim







Reply | Threaded
Open this post in threaded view
|

Re: How to check if I have lines with different length in a database extract

James Vega-3
In reply to this post by Bertilo Wennergren
On Wed, Nov 02, 2005 at 10:36:33PM +0900, Bertilo Wennergren wrote:

> On 11/2/05, Tim Chase <[hidden email]> wrote:
>
> > If, however, you've got a large file (or more lines than you
> > reasonably care to scroll through), you can use something like
>
> >         :v/\%40c.$/#
> >
> > which will return a list of each of the lines that *don't* have 40
> > characters in them,
>
> That actually seems to count bytes, not characters. I tried it using
> UTF-8, and my two-byte characters counted as two, at least sometimes.
> The results were not consistent!
>
> E.g.:
>
>   oooo
>   oooö
>   oooo
>
>   :v/\%4c.$/#
>   Pattern found in every line: \%4c.$
>
> But:
>
>   oooo
>   ooöo
>   oooo
>
>   :v/\%4c.$/#
>         2 ooöo
>
> Have I stumbled on a bug? This was in Vim 6.4 in Linux (Kubuntu).
Use \%4v instead of \%4c.

:he /\%c
:he /\%v

James
--
GPG Key: 1024D/61326D40 2003-09-02 James Vega <[hidden email]>

attachment0 (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to check if I have lines with different length in a database extract

Charles E Campbell Jr
In reply to this post by sanjeev.g.sapre
[hidden email] wrote:

>...Each type of file has some fixed length of a line. Before we can pass on
>this file for further processing I would like to check that all lines are
>of equal length. Is there a simple way /pattern by which I can identify
>lines with differing  length.
>  
>
:v/^.*\%74c.$/p

will display all lines that don't have 74 characters in them.  I just
picked 74 out of the air, of course -- adjust it to whatever you need.

Regards,
Chip Campbell