主管  上海市教育委员会

      主办  上海出版印刷高等专科学校

      ISSN  1007-1938

      CN  31-1643/TS

      模型训练使用出版内容数据的规制困局与破解之维

      Regulatory Dilemmas and Resolving Dimensions of Using Published Content Data for Model Training

      • 摘要: 模型训练革新了出版内容的利用方式,打破了传统出版领域创作者、传播者、使用者的三元格局,重塑了创作者、传播者、数据处理者、使用者的四元格局。在著作权法框架内规制模型训练使用出版内容数据行为,将弱化出版商的利益保护,存在出版商法律地位不明、作品与出版内容数据客体混同、现有交易模式不适配、著作权侵权规制失灵等问题。为此,应推进出版内容数据化转型,打造出版内容数据语料库;开拓著作权与数据协同保护路径,明确出版商的数据处理者权益;引入模型训练“选择退出”机制,协调数据保护与数据利用的关系;推进模型训练使用内容透明化,限制对出版内容数据的违规抓取。

         

        Abstract: Model training has revolutionized the utilization of published content, dismantling the traditional tripartite framework comprising creators, disseminators, and users within the publishing domain. It has reconfigured this landscape into a quadripartite structure involving creators, disseminators, data processors, and users. Regulating the use of published content data for model training within the existing copyright law framework tends to weaken the protection of publishers' interests. This approach presents several challenges, including the ambiguous legal status of publishers, the conflation of protected works with the object of published content data, the inadequacy of existing transactional models, and the ineffectiveness of copyright infringement regulations. To address these issues, it is imperative to advance the digital transformation of published content by establishing comprehensive published content data corpora. A synergistic approach integrating copyright and data protection should be developed, explicitly recognizing the rights and interests of publishers as data processors. The introduction of an "opt-out" mechanism for model training is essential to balance data protection with data utilization. Furthermore, promoting transparency in the use of content for model training and restricting the unauthorized scraping of published content data are crucial steps forward.

         

      /

      返回文章
      返回