Current mobile devices like mobile phones or personal digital assistants have become more and more powerful; they already offer features that only few users are able to exploit to their whole extent. With a number of upcoming mobile multimedia applications, ease of use becomes one of the most important aspects. One way to improve usability is to make devices aware of the user’s context, allowing them to adapt to the user instead of forcing the user to adapt to the device. Our work is taking this approach one step further by not only reacting to the current context, but also predicting future context, hence making the devices proactive. Mobile devices are generally suited well for this task because they are typically close to the user even when not actively in use. This allows such devices to monitor the user context and act accordingly, like automatically muting ring or signal tones when the user is in a meeting or selecting audio, video or text communication depending on the user’s current occupation. This paper presents an architecture that allows mobile devices to continuously recognize current and anticipate future user context. The major challenges are that context recognition and prediction should be embedded in mobile devices with limited resources, that learning and adaption should happen on-line without explicit training phases and that user intervention should be kept to a minimum with non-obtrusive user interaction. To accomplish this, the presented architecture consists of four major parts: feature extraction, classification, labeling and prediction. The available sensors provide a multi-dimensional, highly heterogeneous input vector as input to the classification step, realized by data clustering. Labeling associates recognized context classes with meaningful names specified by the user, and prediction allows to forecast future user context for proactive behavior.