First, consider AnimationState is not a requirement. We have a timeline API that can be used without AnimationState. AnimationState is already doing a lot to help make playing animations easy.
Typically, when using AnimationState, animations fall into two categories: track 0, where they set the main pose for the skeleton, or track 1+, where they are applied on top of lower tracks. Currently you can use tracks however you like, there are no conventions. Track 1+ animations may be applied on the first highest track that isn't already playing an animation, or you may want them on a specific track. An animation may be played on multiple higher tracks, it is not associated with a single track number, except for in your particular use case. There are other use cases, such as allowing a track 0 animation to play while you play an animation on a higher track which sets the whole pose, similar to a track 0 animation. That way when the higher track is done, the lower track is still playing and has progressed the whole time.
If we could encode how animations will be played back so that in most of the cases you don't need to manage yourself it at runtime, then we'd likely do that. It's a fine road to walk because it can convolute applying animations manually, so "most" needs to be quite high and there still needs to be a nice way to do it manually.
If you don't like hardcoding how animations are played in your application, I would suggest making it data driven. Don't get hung up on it having to happen inside Spine. You could have a spreadsheet or other tools that are used to describe how animations are played in your particular application. Even if Spine had the features you request, in all but the simplest applications you will likely have the need for other data driven behavior, such as attack damage, world properties, etc. Don't tell me you want all that in Spine too! 😉