vim: list all matching lines

vim_compOne of the coolest features of Notepad++ is you can find all matches (occurrences) of a pattern in a file in a list. Here are some ways to do the same in vim, sans any external plugin.

1. vimgrep (Error List)

This is a command to be used in the vim command mode. The syntax is:

:vimgrep pattern %

To open the list of matches in a buffer:

:copen

Use <Up> and <Down> keys to navigate the list, <Enter> to select a match. Traverse the matches in the open file using <n> and <N> the regular way.

Note that you can replace vimgrep with normal grep. This will increase one step (lists on the console first) but works almost the same way.

2. lvim (Location List)

Run the following in the command mode:

:lvim pattern %

To open the list

:lopen

Navigation is similar to that in vimgrep.

3. global search

To get the list of all matches in the file, run the following in command mode:

:g/regular-expression/p

Note that grep got its name from this command sequence!

grab: grep faster

search_compRemember The Silver Searcher? grab is another faster grep alternative that tries to use multiple cores. The author uses the techniques below:

  • Parallel processing
  • Uses mmap(2) with MAP_POPULATE and matches the whole file blob without counting newlines
  • If available, grab also uses the PCRE JIT feature
  • grab skips files which are too small to contain the regular expression

However, speedup for a single file is negligible. The performance boost is measurable in case of faster hardware like SSDs.

grab is designed to find string matches in large directory trees. However it doesn’t support as many options as grep, is not pipe-able and doesn’t work on stdin (which cannot be mmapped).

grab uses mmaped chunks of 1GB. For larger files, the last 4096 byte (1 page) of a chunk are overlapped, so that matches on a 1 GB boundary can be found. For this boundary matches, the results will show two entries with the same offset.

Installation

Compile grab from source to use it:

$ git clone https://github.com/stealth/grab.git
$ cd grab
$ make

grab uses a new pcre library, on some older systems the build can fail due to PCRE_INFO_MINLENGTH and pcre_study().

Usage

Options:

-O     -- print file offset of match
-l     -- do not print the matching line (Useful if you want
          to see _all_ offsets; if you also print the line, only
          the first match in the line counts)
-I     -- enable highlighting of matches
-c  -- Use n cores in parallel (useless and even slower in most 
          situations)
          n <= 1 uses single-core
-r     -- recurse on directory
-R     -- same as -r

On GitHub: grab

grep offset to a string in a binary file

terminalPeople using grep should be familiar with the following output in a grep result:

Binary file www_browser matches

What if you are interested in the offset to the string in the binary file because, say, you are trying to reverse engineer something? Yes, there are hex editors available to handle that but good old grep is smart enough too. Here’s how.

$ grep -baron flashplayer.so *
www_browser:87101:85138113:flashplayer.so
www_browser:87101:85138165:flashplayer.so
www_browser:95935:87170022:flashplayer.so
www_browser:95937:87170981:flashplayer.so

where,

b: show the byte offset
a: treat the binary file as a text file (otherwise grep skips)
r: recursive search
o: show only matching (less cluttered output without full "text" lines)
n: show line number

in the output,

column 1: file name
column 2: line number in decimal (as grep treats the file as text)
column 3: file offset in decimal
column 4: matching string

Probably you won’t be interested in the file name and line number if you know the file. You can refine the command as:

$ grep -bao flashplayer.so www_browser 
85138113:flashplayer.so
85138165:flashplayer.so
87170022:flashplayer.so
87170981:flashplayer.so

crgrep: grep any resource

search_compEver wanted to grep a pattern in a PDF document? How about a database or the web? crgrep is a powerful grep-like utility written using JAVA that can do much more than just searching for patterns in text files. crgrep stands for Common Resource grep.

Resources crgrep supports:

  • text documents, PDFs
  • database tables
  • ZIP, TAR, WAR, EAR and JAR archive formats
  • image metadata (jpeg, gif etc.)
  • text in scanned documents (jpeg/gif/tiff/bmp/png), extracted using OCR
  • Maven POM files, following dependency trees of resource artifacts
  • web resources
  • combinations of supported resources

DOWNLOAD

crgrep is distributed as binary from its SourceForge project page. After extracting the archive, crgrep binary can be found in the bin directory.

USAGE

  • Normal calling convention
    $ crgrep <pattern> <resource path(s)>

    Wildcards such as * and ? are supported in pattern or resource path(s). Output is displayed in the format:

    <resource>[[:pagenum]:linenum:matching_content]

    For example:

    Output                                Match
    ------------------------------------------------------------------
    src/foo.java                          File listing match
    src/bar.txt:25:some text              File content match (+lineno)
    lib/all.zip[image.gif]                Archive file listing match
    lib/app.war[WEB-INF/web.xml]:6:<d..>  Archive file content match
    pom.xml->stuff.zip[doc.txt]           File listing match
    mypic.jpg: @{Size=25,Com=Scene}       File meta-data match
    TAB: [COL1,COL2,COL3]                 Table column name match
    TAB: data1,data2,data3                Table data match
    Node[1]:{name:"John"}                 Graph database node match
    sample.pdf:1:1:Sample PDF Document    Text extracted from a PDF 
                                          (+pageno and +linenum)
    
  • Find files and data matching key under target directory. Include archives.
    $ crgrep -r key target
    target/simple_file.txt: a key moment
    target/misc.zip[misc/nested_monkey.txt]
    target/monkey-pics.txt:1:A file about happy monkeys.
    target/test-ear.ear[META-INF/MANIFEST.MF]:5:Created-By: Apache monkey
  • What column data in my database matches ‘handle’?
    (database username and password should be in ~/.crgrep)
    For relational DB:

    $ crgrep -d -U "jdbc:sqlite:/databases/db.sqlite3" handle '*'

    For Neo4J graph DB:

    $ crgrep -d -U "http://localhost:7474/" handle '*

    -d stands for database and -U for URI.

  • Search pattern in an image using OCR:
    $ crgrep --ocr report report_scan.png
  • Search in image metadata
    $ crgrep --ocr report report_scan.png
  • Does the google home page contain a ‘favicon’ reference?
    $ crgrep google_favicon http://www.google.com
  • Find maven (POM) dependencies in my project with content matching ‘RunWith’
    $ crgrep -m RunWith pom.xml

Webpage: Common Resource grep

Similar software

  • ag or The Silver Searcher is a faster grep alternative for developers.

Ag: fast grep & ack alternative

search_compgrep is one of the most commonly used utilities on Linux. ack is a faster replacement of grep written purely in portable Perl 5 and takes advantage of the power of Perl’s regular expressions. ack is optimized for searching version controlled source code.

Ag or The Silver Searcher (ag chemically represents the element silver) is an optimized replacement for ack. It is 3 to 5 times faster than ack and targets source code search. It ignores file patterns from directories created by version control systems. If there are files in your source repo you don’t want to search, just add their patterns to a .agignore file.

The author explains the tweaks that make Ag so fast:

  • Searching for literals (no regex) uses Boyer-Moore-Horspool strstr.
  • Files are mmap()ed instead of read into a buffer.
  • If built with PCRE 8.21 or greater, regex searches use the JIT compiler.
  • Ag calls pcre_study() before executing the regex on a jillion files.
  • Instead of calling fnmatch() on every pattern in ignore files, non-regex patterns are loaded into an array and binary searched.
  • Ag uses Pthreads to take advantage of multiple CPU cores and search files in parallel.

Tempted to try it out? To install Ag on Ubuntu:

$ sudo apt-get install silversearcher-ag

You can integrate Ag in vim using the ack.vim plugin. Add the following line to your .vimrc:

let g:ackprg = 'ag --nogroup --nocolor --column'

The cmdline options are similar to grep. A common search using Ag is:

$ ag -anr "search_string" *
where,
a: include all files
n: show line numbers
r: search recursivel

Here are my benchmarks with grep and Ag:

$ time grep -nr "fprintf" *
real    0m0.043s
user    0m0.031s
sys    0m0.011s
$ time ag -anr "fprintf" *
real    0m0.033s
user    0m0.045s
sys    0m0.034s

Real is wall clock time – time from start to finish of the call.
User is the amount of CPU time spent in user-mode code (outside the kernel) within the process.
Sys is the amount of CPU time spent in the kernel within the process.

The results are consistent. Ag does more processing but completes faster (Real time) than grep.

Webpage: Ag

Similar software

  • jrep is a grep-like utility powered by regular expression compiler rejit.

find, grep and vim for Windows

win_appsI posted before on the powerful grep and find utlities. If you are a regular Linux user you’ll find them extremely useful. If you need to work on Windows and are missing these and the omnipotent vi editor, there are some free and light utilities available for you:

  • WinGrep – Has a GUI. Feature packed but often hangs when you use it on a huge set of files like the Linux kernel source code. Good for a few hundred files.
  • GNU grep & GNU find – Good old Linux grep and find ported to Windows. Fast and dependable.
  • GVim – Vim ported for Windows.

The smart find and grep utilities

search_compfind and grep are most useful friends when you are connected to an unfamiliar remote Linux box via a terminal or trying to find some specific files or a particular string in any file.
For example, to find all movies matching the name “evening”, of type MKV and size between 650MB to 750MB in the current directory recursively, you can run:

# find . -iname '*evening*.mkv' -size +650M -size -750M

To find the c files accessed and modified in the last 10 minutes:

$ find /home/david -amin -10 -name '*.c'
$ find /home/david -mmin -10 -name '*.c'

Look for string in .c and .cpp files:

$ grep -nr string --include=*.c --include=*.cpp

Check out the man pages for many more useful options.