Kingsoft Office was again recognized by AI International Top Meeting, Monkey multimodal large model was accepted by CVPR 2024

2024-03-25


Following the official opening of public beta testing of WPS AI, realizing the first landing of AI technology in the domestic office field, Kingsoft Office has recently made new progress in technology. Kingsoft Office and Huazhong University of Science and Technology jointly developed the Monkey multimodal large model, which was accepted by CVPR 2024, the top international conference in the field of artificial intelligence. At the same time, Kingsoft Office and Huazhong University of Science and Technology have also carried out important upgrades around the "Monkey" in the document field, launching the text multimodal large model TextMonkey, which is an international leader in a number of document comprehension tasks, and a solid step towards universal text recognition.

"Released at the end of 2023, TextMonkey is a multimodal large model jointly developed by Kingsoft Office and the School of Software of Huazhong University of Science and Technology. The model is capable of "observing" the world, conducting in-depth Q&A exchanges and precise descriptions of pictures. This achievement has also been ranked as the top open source model in Meta AI's internationally recognized "Sinan" multimodal large model ranking, just behind industry leader OpenAI's GPT4V and Google's Gemini and other closed-source models.

Recently, Kingsoft Office and the University of Science and Technology of China again upgraded the launch of text multimodal large model TextMonkey, is a breakthrough in the boundaries of the ability to understand general documents. In the scene text recognition, office document summary generation, mathematical problem solving and answering, document layout analysis, table comprehension, charts and graphs Q&A, as well as electronic documents, key information extraction and other 12 authoritative document datasets have achieved significant results.


  For example, TextMonkey can be used to help users answer math questions and give solution steps to promote the development of education automation; TextMonkey can also help people understand structured charts, tables, and document data by converting image content into a lightweight data exchange format for easy recording and extraction. Because TextMonkey mimics human visual cognition, it naturally recognizes the interconnectedness of parts of a high-definition document image and sensitively identifies key elements within the image. Moreover, based on its in-depth understanding of the diverse needs of users, TextMonkey can strengthen the accuracy of answers and enhance the interpretability of the model through text localization techniques, effectively improving its performance in handling various document tasks.

 Currently, as enterprises accelerate their digital transformation, multimodal structured analysis of documents and images and content extraction are especially critical. Whether dealing with randomly taken pictures, electronic documents, office software files or chart analysis reports, fast, automated and accurate data processing is of decisive significance for improving the productivity of enterprises. comprehension ability.


Related Tags