Pentesters Prerequisites
Regular Expressions
Sometimes regular expressions are a concept difficult to understand and use. A regular expression is a set of characters that describes a search pattern. You can use this pattern in a very different way, for example you can search its presence in a string or in a text (pattern-matching).
Usually a pentester uses regular expressions to filter and extract information in documents, client-server communications, tools output and much more.
For instance, we could use them to extract all the email addresses of a web page as well as filter nmap
results. From a "defensive" point of view, regular expressions are also commonly used to verify and sanitize inputs. This may be used to avoid the input having bad character or invalid text.
You can create a regexp object with:
literal notation (as shown)
%r notation
OO notation
The %r
notation works like %
notation of strings. The r
tells the interpreter to treat the string inside the delimiter as a regular expression. Similar to the string notation, delimiters are custom:
OO notation is simple. Just use new
with Regexp
class to create the corresponding Regexp object. You can also use Regexp.compile
as a synonym for Regexp.new
:
If you use a literal notation you can add a character modifier after the last /
of the Regexp. The most commonly used modifier is the i
character, which is used for case insensitive matching. If you use OO notation, you shoudl specify the correct attribute when you create the Regexp:
Match Method
Regexp class provides some very useful methods. One of these is match
. With a MatchData
object you can get some information about the matching such as the position of the matched substring, the matched words and much more. You can treat MatchData
as an array, where at each position you can find the matching substring.
Special characters
There are some characters with special meanings:
If you want to use them, you have to use a backslash \
in order to escape them. As in:
Regular Expression Syntax
Rule | Matching |
. | A single character (it does not match newline) |
[] | At least one of the character in square brackets |
[^] | At least one of the character not in square brackets |
\d | A digit. Same as [0-9] |
\D | A non digital characters. Same as [^0-9] |
\s | A white space |
\S | A non whitespace |
\w | A word character, same as [A-Za-z0-9] |
\W | A non word characters |
The following are some examples that will explain these special characters:
Repetitions
Most used syntax rules of regular expression:
Rule | Matching |
| Zero or more occurrences of exp |
| One or more occurrences of exp |
| Zero or one occurrence of exp |
| n or more occurrences of exp |
| at least n and at most m occurrences of exp |
Anchors
Used to specify the position of the pattern matching. The most commonly used are:
Rule | Matching |
| exp must be at the begin of a line |
| exp must be at the end of a line |
| exp musb be at the begin of the whole string |
| exp must be at the end of the whole string |
| same as \Z but match newline too |
Global variables
Variable | Description |
$~ | The MatchData object of the last match, Rest are derived from this one. |
$& | The substring that matches the first group pattern. |
$1 | The substring that matches the second group pattern. |
$2,$3, etc | And so on... |
Working with strings
If you take a look at the string class methods, you will notice that many of them can have a regular expression as argument. You can use regexp for: gsub
, sub
, split
and more.
scan
allows to iterate through more occurrences of the text matching pattern.
Dates and Time
There are different classes to treat them in Ruby:
Time
Date
DateTime
Time
class provides methods to work with your operating system date and time functionality.
Predicates and Conversions
Comparisons
From time to string
There are many other methods that can be used on Time objects. For example, you can obtain a string with to_s
or ctime
method according to the wanted format.
Other classes
Ruby provides other classes to manage dates and time data:
Date
it is used to manage dateDateTime
it is a subclass ofDate
and it allows to manage time too
Both Date
and DateTime
can be used as Time
. The main difference between `Time and the other two is the internal implementation.
Usually Date
and DateTime
are slower than Time
. They provide different methods that may be useful for your script.
A very useful method is _parse
which allows you to create a time Object
from a string.
Files and Directories
Ruby provides two classes:
Dir
for directories. Defines class methods that allows you to work with directories. It provides a variety of ways to list directories as well as their content. It can also be used to know where the Ruby script is executed or to navigate between file system directories.File
for files. Open a file, get information about it, change its name, chage its permissions and much more.
Directory
File
Last updated