Summary

Sometimes you may require a list of IP addresses that have accessed your website from a particular source or from a bot, on Linux the Apache log files are in the format:

180.76.15.5 - - [25/Jul/2016:15:30:48 +0100] "GET / HTTP/1.1" 301 612 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.15.144 - - [25/Jul/2016:15:31:30 +0100] "GET / HTTP/1.1" 301 612 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
23.79.233.23 - - [25/Jul/2016:15:37:14 +0100] "GET / HTTP/1.1" 301 598 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
23.79.233.24 - - [25/Jul/2016:15:37:19 +0100] "GET / HTTP/1.1" 302 582 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
180.76.15.153 - - [25/Jul/2016:15:40:32 +0100] "GET / HTTP/1.1" 301 561 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.15.28 - - [25/Jul/2016:15:40:33 +0100] "GET / HTTP/1.1" 302 545 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
141.212.122.128 - - [25/Jul/2016:15:47:26 +0100] "GET / HTTP/1.1" 301 542 "-" "Mozilla/5.0 zgrab/0.x"
141.212.122.128 - - [25/Jul/2016:15:47:27 +0100] "GET / HTTP/1.1" 302 526 "http://91.192.193.170:80/" "Mozilla/5.0 zgrab/0.x"
64.138.2.85 - - [25/Jul/2016:15:48:12 +0100] "GET / HTTP/1.1" 301 598 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0"
64.138.2.85 - - [25/Jul/2016:15:48:13 +0100] "GET / HTTP/1.1" 302 582 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0"

From this you might want to block the "Baiduspider" if it is causing an increased load on your website. 

Stey-by-Step Guide

Obtaining a list of IP Addresses can be achieved very easily using the following command:

grep "baidu.com" access_log | cut -d' ' -f1 | sort | uniq

Alternatively you can search for the spiders name (these commands are case sensitive, use the switch '-i' with the grep command to remove case sensitivity)

grep "Baiduspider" access_log | cut -d' ' -f1 | sort | uniq
grep "YandexBot" access_log | cut -d' ' -f1 | sort | uniq

This will return a single list of IP Addresses that have accessed your website from Baidu, or any other spiders.

grep "baidu.com" access_log | cut -d' ' -f1 | sort | uniq
123.125.71.76
123.125.71.86
123.125.71.90
180.76.15.135
180.76.15.139
180.76.15.140
180.76.15.141
180.76.15.144
180.76.15.145
180.76.15.147
180.76.15.153
180.76.15.157
180.76.15.158
180.76.15.159
180.76.15.16
180.76.15.22
180.76.15.26
180.76.15.28
180.76.15.29
180.76.15.34
180.76.15.5
WARNINGBe careful that you do not block any legitimate IP Addresses using this method. You can confirm the owner of the IP address by performing a WHOIS lookup against it.