HTML 특수 문자를 제거하는 방법은 무엇입니까?

programing

HTML 특수 문자를 제거하는 방법은 무엇입니까?

css3 2023. 7. 24. 22:42

HTML 특수 문자를 제거하는 방법은 무엇입니까?

는 HTML 태그를 피드 , 이 파일은 태그를 하는 것입니다.strip_tags.그렇지만strip_tagsHTML 특수 코드 문자를 제거하지 않습니다.

&nbsp; &amp; &copy;

기타.

제 문자열에서 이 특수 코드 문자를 제거하는 데 사용할 수 있는 기능을 알려주십시오.

다음을 사용하여 암호를 해독합니다.html_entity_decode또는 다음을 사용하여 제거합니다.preg_replace:

$Content = preg_replace("/&#?[a-z0-9]+;/i","",$Content);

(여기서부터)

편집: Jacco의 의견에 따른 대안.

'+'를 {2,8}(으)로 바꾸면 좋을 것 같습니다.이렇게 하면 인코딩되지 않은 '&'이 있을 때 전체 문장을 대체할 가능성이 제한됩니다.

$Content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$Content);

HTML 엔티티를 변환합니다.

올바르게 작동하려면 문자 집합을 설정해야 합니다.

외에도, 하고 있습니다: 위좋은에답변외에도다의있습니내는어장기되능이필터우매용.filter_var.

HTML 문자를 제거하려면 다음을 사용합니다.

$cleanString = filter_var($dirtyString, FILTER_SANITIZE_STRING);

추가 정보:

여기서 htmlentity()와 html_entity_decode()를 살펴볼 수 있습니다.

$orig = "I'll \"walk\" the <b>dog</b> now";

$a = htmlentities($orig);

$b = html_entity_decode($a);

echo $a; // I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt; now

echo $b; // I'll "walk" the <b>dog</b> now

이 방법은 특수 문자를 제거하는 데 효과적일 수 있습니다.

$modifiedString = preg_replace("/[^a-zA-Z0-9_.-\s]/", "", $content);

HTML 특수 문자를 변환하고 싶을 뿐만 아니라 제거하고 일반 텍스트를 준비하고 싶다면 이것이 저에게 효과적인 해결책입니다.

function htmlToPlainText($str){
    $str = str_replace('&nbsp;', ' ', $str);
    $str = html_entity_decode($str, ENT_QUOTES | ENT_COMPAT , 'UTF-8');
    $str = html_entity_decode($str, ENT_HTML5, 'UTF-8');
    $str = html_entity_decode($str);
    $str = htmlspecialchars_decode($str);
    $str = strip_tags($str);

    return $str;
}

$string = '<p>this is (&nbsp;) a test</p>
<div>Yes this is! &amp; does it get "processed"? </div>'

htmlToPlainText($string);
// "this is ( ) a test. Yes this is! & does it get processed?"`

wwith ENT_QUOTES | 은 html_entity_decode와 같은 을 ENT_QUOTES | ENT_XML1로 합니다.'와 같은 합니다.&은 html_html_html 같은 것을 합니다.'<합니다.strip_tags는 HTML 태그입니다.

EDIT - str_replace(', ', ', $str) 및 기타 몇 가지 html_entity_decode()가 지속적인 테스트를 통해 필요한 것으로 나타났습니다.

플레인 바닐라는 프레그렉스 엔진을 사용하지 않고도 이를 수행할 수 있는 방법은 다음과 같습니다.

function remEntities($str) {
  if(substr_count($str, '&') && substr_count($str, ';')) {
    // Find amper
    $amp_pos = strpos($str, '&');
    //Find the ;
    $semi_pos = strpos($str, ';');
    // Only if the ; is after the &
    if($semi_pos > $amp_pos) {
      //is a HTML entity, try to remove
      $tmp = substr($str, 0, $amp_pos);
      $tmp = $tmp. substr($str, $semi_pos + 1, strlen($str));
      $str = $tmp;
      //Has another entity in it?
      if(substr_count($str, '&') && substr_count($str, ';'))
        $str = remEntities($tmp);
    }
  }
  return $str;
}

가 한 은 다음과 같은 것을이었습니다.html_entity_decode그 다음에 사용strip_tags그들을 제거하는 것.

이것을 먹어보세요.

<?php
$str = "\x8F!!!";

// Outputs an empty string
echo htmlentities($str, ENT_QUOTES, "UTF-8");

// Outputs "!!!"
echo htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8");
?>

당신이 정말 원하는 것은 다음과 같습니다.

function xmlEntities($string) {
    $translationTable = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);

    foreach ($translationTable as $char => $entity) {
        $from[] = $entity;
        $to[] = '&#'.ord($char).';';
    }
    return str_replace($from, $to, $string);
}

명명된 엔터티를 해당 숫자로 바꿉니다.

<?php
function strip_only($str, $tags, $stripContent = false) {
    $content = '';
    if(!is_array($tags)) {
        $tags = (strpos($str, '>') !== false
                 ? explode('>', str_replace('<', '', $tags))
                 : array($tags));
        if(end($tags) == '') array_pop($tags);
    }
    foreach($tags as $tag) {
        if ($stripContent)
             $content = '(.+</'.$tag.'[^>]*>|)';
         $str = preg_replace('#</?'.$tag.'[^>]*>'.$content.'#is', '', $str);
    }
    return $str;
}

$str = '<font color="red">red</font> text';
$tags = 'font';
$a = strip_only($str, $tags); // red text
$b = strip_only($str, $tags, true); // text
?>

제가 작업을 수행할 때 사용한 기능은 슈나이더가 만든 업그레이드에 참여하는 것입니다.

    mysql_real_escape_string(
        preg_replace_callback("/&#?[a-z0-9]+;/i", function($m) { 
            return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); 
        }, strip_tags($row['cuerpo'])))

이 기능은 MySQL에 저장할 준비가 된 UTF-8로 변환된 모든 html 태그와 html 기호를 제거합니다.

시도해 보세요htmlspecialchars_decode($string)저한테는 효과가 있어요.

http://www.w3schools.com/php/func_string_htmlspecialchars_decode.asp

만약 당신이 WordPress에서 일하고 있고 나와 같이 빈 필드를 확인하기만 하면 된다면 (그리고 빈 문자열처럼 보이는 것에 많은 양의 무작위 html 엔티티가 있다) 다음을 살펴보세요.

sanitize_title_with_dashes( string $title, string $raw_title = '', string $context = 'display' )

워드프레스 기능 페이지 링크

워드프레스에서 일하지 않는 사람들에게, 저는 이 기능이 저만의 살균제를 만들고, 전체 코드를 살펴보고, 정말로 깊이가 있다는 것을 알게 되었습니다!

$string = "äáčé";

$convert = Array(
        'ä'=>'a',
        'Ä'=>'A',
        'á'=>'a',
        'Á'=>'A',
        'à'=>'a',
        'À'=>'A',
        'ã'=>'a',
        'Ã'=>'A',
        'â'=>'a',
        'Â'=>'A',
        'č'=>'c',
        'Č'=>'C',
        'ć'=>'c',
        'Ć'=>'C',
        'ď'=>'d',
        'Ď'=>'D',
        'ě'=>'e',
        'Ě'=>'E',
        'é'=>'e',
        'É'=>'E',
        'ë'=>'e',
    );

$string = strtr($string , $convert );

echo $string; //aace

"HTML 특수 문자 제거"가 "적절하게 교체"를 의미한다면 어떻게 됩니까?

결국, 당신의 예를 보세요...

&nbsp; &amp; &copy;

만약 당신이 RSS 피드를 위해 이것을 벗기고 있다면, 당신은 동등한 것들을 필요로 하지 않나요?

" ", &, ©

아니면 정확히 동등한 것을 원하지 않을 수도 있습니다.아마도 당신은 갖고 싶을 것입니다. (너무 많은 공간을 방지하기 위해) 그냥 무시되지만, 그 다음에는 가지고 있습니다.©실제로 교체됩니다.이 문제를 누구나 해결할 수 있는 해결책을 생각해 봅시다.

HTML 특수 문자를 선택적으로 대체하는 방법

논리는 간단합니다.preg_match_all('/(&#[0-9]+;)/'모든 매치를 잡은 다음, 우리는 단순히 매치 가능한 것과 교체 가능한 것의 목록을 작성합니다.str_replace([searchlist], [replacelist], $term)이 작업을 수행하기 전에 명명된 엔티티를 숫자로 변환해야 합니다." "용납할 수 없지만,"&#00A0;"괜찮습니다. (이 부분에 대한 it-alien의 해결책 덕분입니다.)

작업 데모

이 데모에서는 다음과 같이 대체합니다.{와 함께"HTML Entity #123"물론, 당신은 당신의 경우에 당신이 원하는 어떤 종류의 찾기 대체에도 이것을 미세 조정할 수 있습니다.

내가 왜 이걸 만들었을까요?UTF8 문자 인코딩 HTML에서 리치 텍스트 형식을 생성할 때 사용합니다.

전체 작업 데모 보기:

전체 온라인 작업 데모

    function FixUTF8($args) {
        $output = $args['input'];
        
        $output = convertNamedHTMLEntitiesToNumeric(['input'=>$output]);
        
        preg_match_all('/(&#[0-9]+;)/', $output, $matches, PREG_OFFSET_CAPTURE);
        $full_matches = $matches[0];
        
        $found = [];
        $search = [];
        $replace = [];
        
        for($i = 0; $i < count($full_matches); $i++) {
            $match = $full_matches[$i];
            $word = $match[0];
            if(!$found[$word]) {
                $found[$word] = TRUE;
                $search[] = $word;
                $replacement = str_replace(['&#', ';'], ['HTML Entity #', ''], $word);
                $replace[] = $replacement;
            }
        }

        $new_output = str_replace($search, $replace, $output);
        
        return $new_output;
    }
    
    function convertNamedHTMLEntitiesToNumeric($args) {
        $input = $args['input'];
        return preg_replace_callback("/(&[a-zA-Z][a-zA-Z0-9]*;)/",function($m){
            $c = html_entity_decode($m[0],ENT_HTML5,"UTF-8");
            # return htmlentities($c,ENT_XML1,"UTF-8"); -- see update below
            
            $convmap = array(0x80, 0xffff, 0, 0xffff);
            return mb_encode_numericentity($c, $convmap, 'UTF-8');
        }, $input);
    }

print(FixUTF8(['input'=>"Oggi &egrave; un bel&nbsp;giorno"]));

입력:

"Oggi è un bel giorno"

출력:

Oggi HTML Entity #232 un belHTML Entity #160giorno

언급URL : https://stackoverflow.com/questions/657643/how-to-remove-html-special-chars

'programing' 카테고리의 다른 글

영숫자가 아닌 모든 문자, 새 줄 및 여러 공백을 하나의 공백으로 바꾸기 (0)	2023.07.24
Angular2 코드의 TypeScript 오류: 'module' 이름을 찾을 수 없습니다. (0)	2023.07.24
.용 오라클 데이터 공급자.NET: 연결 요청 시간이 초과되었습니다. (0)	2023.07.24
imdb.load_data() 함수에 대한 'allow_pickle=False일 때 객체 배열을 로드할 수 없음'을 수정하는 방법은 무엇입니까? (0)	2023.07.24
모수 크기의 인수로 고정 장치를 사용한 파이 검정 (0)	2023.07.24

현재글HTML 특수 문자를 제거하는 방법은 무엇입니까?

각종 프로그래밍 정보를 다루는 블로그입니다.

android, WordPress, Python, sql-server, jQuery, spring-boot, Ajax, MySQL, oracle, MariaDB, AngularJS, C, JSON, reactjs, CSS, git, MongoDB, ASP.NET, TypeScript, Excel,

Today :
Yesterday :

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

css3

HTML 특수 문자를 제거하는 방법은 무엇입니까?

HTML 특수 문자를 제거하는 방법은 무엇입니까?

"HTML 특수 문자 제거"가 "적절하게 교체"를 의미한다면 어떻게 됩니까?

HTML 특수 문자를 선택적으로 대체하는 방법

작업 데모

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

HTML 특수 문자를 제거하는 방법은 무엇입니까?

HTML 특수 문자를 제거하는 방법은 무엇입니까?

"HTML 특수 문자 제거"가 "적절하게 교체"를 의미한다면 어떻게 됩니까?

HTML 특수 문자를 선택적으로 대체하는 방법

작업 데모

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바