I wish there were more space to devote to animating our objects, but unfortunately there isn't. Animation is a rich topic,from key frame animation to motion capture to rotoscoping. I'll just be able to give a sweeping discussion about a few techniques used in animation, then talk about hierarchical objects.
Back in the 2D days, animation was done using sprites. Sprites are just bunches of pixels that represent images on the screen. A set of animation frames would be shown in rapid succession to give the illusion of motion. The same technique is used in animated films to give life to their characters.
In 3D, the landscape is much more varied. Some systems use simple extensions from their 2D counterparts. Some games have a complete set of vertex positions for each frame of each animation. This made it very similar to 2D games, just replacing pixels with vertices. Newer games move a step further, using interpolation to smoothly morph between frames. This way the playback speed looks good independent of the recording speed; an animation recorded at 10 fps still looks smooth on a 60 fps display.
While systems like this can be very fast (you have to compute, at most, a linear interpolation per vertex), they have a slew of disadvantages. The primary disadvantage is that you must explicitly store each frame of animation in memory.
If you have a model with 500 vertices, at 24 bytes (3 floats) per vertex, that's 12 kilobytes of memory needed per frame. If you have several hundred frames of animation, suddenly you're faced with around a megabyte of storage per animated object. In practice, if you have many different types of objects in the scene, the memory requirements become prohibitive.
Note The memory requirements for each character model in Quake III: Arena were so high that the game almost had an eleventh-hour switch over to hierarchical models.
Explicitly placing each vertex in a model each frame isn't the only solution. It is lathered in redundancy. The topology of the models remains about the same. Outside of the bending and flexing that occurs at model joints, the relative locations of the vertices in relation to each other stays pretty similar.
The way humans and other animals move isn't defined by the skin moving around. Your bones are rigid bodies connected by joints that can only bend in certain directions. The muscles in your body are connected to the bones through tendons and ligaments, and the skin sits on top of the muscles. Therefore, the position of your skin is a function of the position of your bones.
This structural paradigm is emulated by bone-based animation. A model is defined once in a neutral position, with a set of bones underlying the structure of the model. All of the vertices in the forearm region of the model are conceptually bound to the forearm bone, and so forth. Instead of explicitly listing a set of vertices per frame for your animation, all this system needs is the orientation of each bone in relation to its parent bone. Typically, the root node is the hip of the model, so that the world matrix for the object corresponds to the position of the hip, and the world transformations for each of the other joints are derived from it.