java去除采集网页多余标签和内容正则-js模板网

 //定义script的正则表达式{或<script[^>]*?>[\\s\\S]*?<\\/script>
String regEx_script = "<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>";
//定义style的正则表达式{或<style[^>]*?>[\\s\\S]*?<\\/style>
String regEx_style = "<[\\s]*?style[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?style[\\s]*?>";
// 定义HTML标签的正则表达式String regEx_html = "<[^>]+>";
// 定义一些特殊字符的正则表达式 如：&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
String regEx_special = "\\&[a-zA-Z]{1,10};";

测试
public static void main(String[] args) throws Exception {
        String string = "测<script s>ss</script><script s>ss</script><script s>ss</script>试";
        String regex = "<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>";
        Pattern p_script = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
        Matcher m_script = p_script.matcher(string);
        string = m_script.replaceAll("");
        System.out.println(string);
    }
————————————————
版权声明：本文为CSDN博主「爱码仕1」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/qq_41712834/article/details/100072852

免责声明：
1、	资源售价只是赞助，不代表代码或者素材本身价格。收取费用仅维持本站的日常运营所需。
2、	本站资源来自用户上传，仅供用户学习使用，不得用于商业或者非法用途，违反国家法律一切后果用户自负。用于商业用途，请购买正版授权合法使用。
3、	本站资源不保证其完整性和安全性，下载后自行检测安全，在使用过程中出现的任何问题均与本站无关，本站不承担任何技术及版权问题，不对任何资源负法律责任。
4、	如有损害你的权益，请联系275551777@qq.com及时删除。

java去除采集网页多余标签和内容正则

第十天 2021-11-13