I need to insert the <fgs> tag for link citation elements for the first occurence on unique/sequence wherever it is placed in the paragraph. However I have a check list to insert the same.
- Do not jump/skip without sequence.
- Do not insert <fgs> tag duplicate link tag.
- If the link tag shuffled in the same paragraph we need to insert the fgs tag. Do not consider the first occurence of link tag.
Input:
<p xml:id="c09-para-0007">The targets for which analysis <link href="#c09-fig-0001"/> is ...</p> <p xml:id="c09-para-0019">Antibodies (Abs) are the molecules the guard of the organism. Among them, lymphocytes B that maturate in the bone marrow are the producers of antibodies. These extraordinary proteins can be envisaged through a basic structure that is shown in Figure <link href="#c09-fig-0002"/>a.</p> <p xml:id="c09-para-0027">Antibody can be lyzed into different constitutive fragments using the Figure <link href="#c09-fig-0003"/>a). The other part is crystallizable and named Fc.</p> <p xml:id="c09-para-0028">Achieving smaller molecules with single aminoacidic chain (scFv, Figures <link href="#c09-fig-0003"/>a and <link href="#c09-fig-0004"/>). This is obtained by spontaneous association of V<sub>H</sub> and V<sub>L</sub> domains generated by recombinant techniques (genetic engineering) that are linked through a chain of 15 amino acids.</p> <p xml:id="c09-para-0029">As commented earlier biologically (through hybridoma technology) (Figure <link href="#c09-fig-0004"/>). To avoid the generation of Fv fragments (by the) <link href="#c09-fig-0003"/>of triabodies (trimeric) or tetrabodies (tetrameric). In the case of bispecific diabodies, produced to generate trivalent mono‐ or tetraspecific tetrabodies<link href="#c09-fig-0005"/>.</p> <p xml:id="c09-para-0030">Antibodies can be obtained (Figure <link href="#c09-fig-0003"/>b). Another novel Ig with one variable domain, called <term xml:id="c09-term-0025">novel antigen receptor (V<sub>NAR</sub>)</term>, was discovered in cartilaginous fish, such as sharks <link href="#c09-bib-0135"/>. Both are small, highly soluble, possess superior <link href="#c09-fig-0004"/>stability, and seem very adequate for biosensing purposes.</p> <p xml:id="c09-para-0030">Known as <term xml:id="c09-term-0023">V<sub>HH</sub></term> or <term xml:id="c09-term-0024">nanobody</term><link href="#c09-bib-0134"/> (Figure <link href="#c09-fig-0006"/>b). Both are small, highly soluble, possess superior stability, and seem very adequate for biosensing purposes.</p>
Output:
<p xml:id="c09-para-0007">The targets for which analysis <link href="#<fgs>c09-fig-0001</fgs>"/> is ...</p> <p xml:id="c09-para-0019">Antibodies (Abs) are the molecules the guard of the organism. Among them, lymphocytes B that maturate in the bone marrow are the producers of antibodies. These extraordinary proteins can be envisaged through a basic structure that is shown in Figure <link href="#<fgs>c09-fig-0002</fgs>"/>a.</p> <p xml:id="c09-para-0027">Antibody can be lyzed into different constitutive fragments using the Figure <link href="#c09-fig-0003"/>a). The other part is crystallizable and named Fc.</p> <p xml:id="c09-para-0028">Achieving smaller molecules with single aminoacidic chain (scFv, Figures <link href="#c09-fig-0003"/>a and <link href="#c09-fig-0004"/>). This is obtained by spontaneous association of V<sub>H</sub> and V<sub>L</sub> domains generated by recombinant techniques (genetic engineering) that are linked through a chain of 15 amino acids.</p> <p xml:id="c09-para-0029">As commented earlier biologically (through hybridoma technology) (Figure <link href="#<fgs>c09-fig-0004</fgs>"/>). To avoid the generation of Fv fragments (by the) <link href="#<fgs>c09-fig-0003</fgs>"/>of triabodies (trimeric) or tetrabodies (tetrameric). In the case of bispecific diabodies, produced to generate trivalent mono‐ or tetraspecific tetrabodies<link href="#<fgs>c09-fig-0005</fgs>"/>.</p> <p xml:id="c09-para-0030">Antibodies can be obtained (Figure <link href="#c09-fig-0003"/>b). Another novel Ig with one variable domain, called <term xml:id="c09-term-0025">novel antigen receptor (V<sub>NAR</sub>)</term>, was discovered in cartilaginous fish, such as sharks <link href="#c09-bib-0135"/>. Both are small, highly soluble, possess superior <link href="#c09-fig-0004"/>stability, and seem very adequate for biosensing purposes.</p> <p xml:id="c09-para-0030">Known as <term xml:id="c09-term-0023">V<sub>HH</sub></term> or <term xml:id="c09-term-0024">nanobody</term><link href="#c09-bib-0134"/> (Figure <link href="#<fgs>c09-fig-0006</fgs>"/>b). Both are small, highly soluble, possess superior stability, and seem very adequate for biosensing purposes.</p>
Code:
readfile('myfile.xml', $tmpxml); while($tmpxml=~m/<p(?: |^>)>((?:(?!<\/p>).)*)<\/p>/sg) { $fpre=$fpre.$`; $fmatch = $&; $fpost = $'; my $lsCnt = $cnt - 1; my $adCnt = $cnt + 1; my $dupMatch = $fmatch; $cnt = sprintf "%04d", $cnt; $lsCnt = sprintf "%04d", $lsCnt; $adCnt = sprintf "%04d", $adCnt; if($dupMatch=~m/-$cnt"/g) { my $nwfpost = $fpost; if($fpost!~m/\-$lsCnt"/g) { if($nwfpost=~m/$adCnt"/g) { $fmatch=~s/<link href="([^"]*)\/>/<link href="<fgs>$1<\/fgs>"\/>/g; } } $cnt++; } $fpre = $fpre.$fmatch; $tmpxml = $fpost; } if(length $fpre) { $tmpxml = $fpre.$fpost; }
No comments:
Post a Comment