8.12.2011

Setting up automatic startup script for Solaris 10 Update 9

To have your applications automatically start at boot time on Solaris, there are 2 ways: SMF and Legacy Init Scripts.

Legacy Init Scripts are old fashioned, used forever on System V based UNIX. They typically look something like this:

Code:

$ cat /etc/init.d/acct




#!/sbin/sh
state="$1"




case "$state" in
'start')

echo 'Starting process accounting'
/usr/lib/acct/startup
;;



'stop')

echo 'Stopping process accounting'
/usr/lib/acct/shutacct
;;



*)

echo "Usage: $0 { start | stop }"
exit 1
;;

esac

exit 0

The above legacy init script typically is simply a case statement that handles at least 2 arguments: start and stop. Very often these scripts will have other options like "status", "restart", "refresh" and others.

These script are stored in /etc/init.d. They are then symlinked into various RC directories, 1 directory per run-level the most commonly used being /etc/rc2.d and /etc/rc3.d . Init scripts symlinked into these directories are prefixed with either a capital S for "Start" or K for "Kill", followed by two numbers. When a system is booted scripts in these run-level directories are run if they start with a S and sequentially based on the two digits, so S00whatever is run first, then S01something, so on and so forth. This is why user added scripts tend to be named S99something, to ensure that they run last. If you have two scripts with the same digits (S99apache and S99bind) thats fine.

So lets say you want /etc/init.d/cswapache2 to start at boot, you'd do this:

Code:

# ln -s /etc/init.d/cswapache2 /etc/rc3.d/S99cswapache2


In a like manner, if something is already set to run at boot but you don't want it to, you can either rename the script so it doesn't begin with a capital S or just delete the symlink.


With SMF we start to look at our applications and daemons as services. Using the svcs command we can view running services, if you add the "-a" option it'll show you all services running or not.

Code:

$ svcs

STATE STIME FMRI

legacy_run Nov_19 lrc:/etc/rc2_d/S20sysetup

legacy_run Nov_19 lrc:/etc/rc2_d/S72autoinstall

legacy_run Nov_19 lrc:/etc/rc2_d/S73cachefs_daemon

legacy_run Nov_19 lrc:/etc/rc2_d/S85cswsaslauthd

legacy_run Nov_19 lrc:/etc/rc2_d/S89PRESERVE

legacy_run Nov_19 lrc:/etc/rc2_d/S98deallocate

legacy_run Nov_19 lrc:/etc/rc3_d/S50cswapache2

online Nov_19 svc:/system/svc/restarter:default

online Nov_19 svc:/system/filesystem/root:default

online Nov_19 svc:/network/loopback:default

...

online 10:20:09 svc:/network/nfs/cbd:default

online 10:20:09 svc:/network/nfs/nlockmgr:default


Here we can see the legacy init scripts that are running, and a few of the SMF services that are online, as well as when they last changed state (ie: started). You'll notice that each service has an identifying "FMRI" (Fault Management Resource Identifier) which is used by other Solaris frameworks such the Fault Management Architecture (FMA).

Dealing with services is easy. We can use the svcadm command to "enable", "disable", "refresh", "restart", or otherwise change the state of a given service.

Code:

$ svcs -a | grep -i mysql

disabled Nov_17 svc:/network/cswmysql5:default

$ svcadm enable svc:/network/cswmysql5:default

$ svcs -a | grep -i mysql

online 16:54:36 svc:/network/cswmysql5:default

$ svcadm restart svc:/network/cswmysql5:default

$ date

Tue Nov 21 16:55:26 PST 2006

$ svcs -a | grep -i mysql

online 16:55:27 svc:/network/cswmysql5:default


In this example above I looked for any MySQL services and found network/cswmysql5 , so I enabled it, verified that it was online, then restarted it and checked again. Notice that the time at which it was started is displayed.

Now lets see one way in which SMF is superior to legacy init scripts. When SMF starts something it has a "contract" for that service. That contract keeps track of whats running for any given service. Using the "-p" option we can see what processes are part of a services contract and take advantage of that intellegance.

Code:

$ svcs -p network/cswmysql5

STATE STIME FMRI

online 16:55:27 svc:/network/cswmysql5:default


16:55:27 28938 mysqld_safe
16:55:27 29004 mysqld

$ kill -9 29004

$ svcs -p network/cswmysql5

STATE STIME FMRI

online* 17:00:01 svc:/network/cswmysql5:default
16:55:27 28938 mysqld_safe
17:00:01 29228 mysqld

$ mysql -u mysql

...

mysql> \q

Bye


Notice here that I used svcs -p to list the processes associated with my MySQL5 service. Then I brutally killed mysqld and faster than I can blink the proccess was restarted! You can see that represented by the "STIME" for mysqld. The asterisk ("online*") indicates that the service is currently in a transistion state, in this case transitioning to online, but as you can see MySQL is already back in action.

But SMF isn't restarting thing in brain-dead mode like an inittab, we can define thresholds reguarding restarts. For instance, if SMF restarts a service more than 3 times in 60 seconds, something probly very wrong and it should stop attempting it. At that point it'll put the service in a "maintance" mode, and it will stay that way until you clear the state with svcadm clear some/service .

Lets look at an example of something broken trying to start. I'm going to break MySQL and then try to start it...

Code:

$ mv /opt/csw/mysql5/var/ /opt/csw/mysql5/xxx-var/

$ svcadm enable network/cswmysql5

$ svcs network/cswmysql5

STATE STIME FMRI

maintenance 17:29:01 svc:/network/cswmysql5:default

$ svcs -vx

svc:/network/cswmysql5:default (?)


State: maintenance since Tue Nov 21 17:29:01 2006

Reason: Restarting too quickly.
See: http://sun.com/msg/SMF-8000-L5
See: /var/svc/log/network-cswmysql5:default.log

Impact: This service is not running.


So I moved MySQL's data directory, obviously it can't start without it. When I enable the service it ends up in "maintenance". Using SMF's most magical command svcs -vx we can see a listing of all services that failed to start, why they failed to start, some information about them, the log location, all dependencies of that service that can't start as a result, and even a URL to a page that'll tell us more!

Now lets resolve the issue and bring the service back online:

Code:

$ mv /opt/csw/mysql5/xxx-var/ /opt/csw/mysql5/var/

$ svcs network/cswmysql5

STATE STIME FMRI

maintenance 17:29:01 svc:/network/cswmysql5:default

$ svcadm clear network/cswmysql5

$ svcs network/cswmysql5

STATE STIME FMRI

online 17:32:57 svc:/network/cswmysql5:default


The usefulness of the svcs -vx command can not be overstated. The first thing I run when logging into any Solaris 10 or OpenSolaris machine is this command.

So how do you actually use SMF with your own service?

SMF Services are defined in XML Manifests. These manifests describe how to start, stop, restart, and refresh (reload the configuration) your application, what dependancies it has, various thresholds, as well as various meta-data that may be useful such as man pages that apply to that service. In addition to the manifest, scripts just like your legacy init scripts can be used which we call methods , or "method scripts".

Service configuration changes can be made by using the svccfg ("Service Config") tool. The most common uses of this command are to import or export a manifest. For instance, I'm curious what the manifest for that MySQL5 service looks like:


Code:

$ svccfg export network/cswmysql5

To view the code refer http://pastebin.com/yRqeMKzi

































That probly looks really intimidating at first glance, but its really not so bad if you just break it down. First we define our dependencies, for instance MySQL is dependent on the network loopback service and the local filesystems service. There are 3 "exec_methods" which define the methods for start, stop, and restart. If you can start your app or daemon in just a single line then you don't need an external method script, but in the case of this service it opts for a script. Notice the "stop" method uses an SMF shortcut which just kills the processes rather than use a script or command.

This is only a very simple example, you can put lots more information in there, but its pretty simple XML when you just break it down.

When you create a new SMF Manifest, you simple put the XML in a file and use svccfg import my_service.xml to import it.