Get unique values in AWK

AWK isn’t equipped with a unique() function by default. But there is a simple way to get unique results without the need of sorting.

To check how it works, first go to bash, create a sequence of numbers with some multiple values and save it to the file “numbers”.

    $ (seq 1 5 ;  echo -e '5\n5' ; seq 6 10) > numbers

Display file content:

    $ cat numbers
    1
    2
    3
    4
    5
    5
    5
    6
    7
    8
    9
    10

Then utilise the construct !a[$x]++ in AWK to get an array containing ones (at position of unique values) and zeros (at position of multiple values). Thereby x is to replace by the field number of interest. The resulting array can be used for logical subsetting of unique values.

    $ awk '{if(!a[$1]++) print $0}' numbers

Output:

    1
    2
    3
    4
    5
    6
    7
    8
    9
   10

Explanation

a[$x]++ increments all occurrences of a[] with the same index (means: with the same record name). Incrementing starts at 0. The increment value is set as array entry on the current index position.

    $ awk '{print(a[$1]++)}' numbers
    0
    0
    0
    0
    0
    1
    2
    0
    0
    0
    0
    0

The negation ! turns all zeros to ones and all numbers !=0 to zeros

    $ awk '{print(!a[$1]++)}' numbers
    1
    1
    1
    1
    1
    0
    0
    1
    1
    1
    1
    1

So you get your logicals which can be used to extract unique values.

Yours faithfully,

Dennis Vier

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s