Discussed below are some of my musings and waffling on moving from a larger company to a smaller one, and how this affects best practice operations.
I joined a company six or so months ago working in team which have a tightly knit group of developers but had shed their web developer, and had the system administrator leave. This has so far prooven an interesting challenge. The situation is not unique, in that the outward facing product to the public has not yet suffered from these departures but the internal infrastructure definitely has done.
The team operates using many developer best practices, but have yet to embrace the same best practices from an operations point of view. I'm trying to change this, but it's going to be a long road (not least because the infrastructure is new to me, and many of these techniques are the first time I've used them also). The net result should be the holy union often referred recently to as "devops", but in reality it's more a case of each job function doing their part.
There is definitely a lot of work to be done as while the development side of things are looking at continuous integration and delivery, code reviews, and well defined practices and policies, the operations side of things has been lacking (think tarball deploys and copy-pasted bash scripts).
There's been a few operations lessons on the way already, and some of these are:
You need to be more agile. This means being able to replace machines quicker, and test changes easier. The net result of this is that you need to be able to provision machines from code. This is where tools such as puppet or chef step in. Traditionally I've seen this done using in-house rpms and shell scripts to deploy changes to configuration. While this works, I can see the flaws and weaknesses in these setups (reinvent the wheel).
You need a way to stage changes to configuration. Like the previous finding, this comes from being able to provision a machine quickly. If you can provision a real one, you can provision a fake one.
One of the realities of working in a smaller company that I'm finding is that money has to be channeled in different ways. While before I may have looked to do things in house and use a pair of servers in a fail-over cluster, now I'm looking more towards hosted applications and quick provisioning. I believe that many small companies don't put nearly enough effort into speeding up provisioning time. This is because provisioning serves many purposes. If you can backup the core data from your systems, you also gain a poor-mans disaster recovery. Whereas before zero downtime was the only acceptable option, full backups were needed and expensive back-end infrastructure to facilitate this. Now however I'm leaning more towards just keeping the irreplaceable data backed up (in our case, very little) and then create an environment where the rest of the system can be recreated by code quickly.