The performance of multiuser beamforming combined with "Dirty Paper" preceding (downlink) or successive interference cancellation (uplink) depends on the preceding (resp. decoding) order. In this paper we propose an improved algorithm for jointly computing the power allocation and beamformers with minimal total power. Neither matrix inversion nor eigenvalue decomposition is required to compute the global optimum. Then, the optimal precoding order is studied. A fundamental understanding of joint precoding and power allocation will be an important prerequisite for the development of transmit scheduling algorithms.