Regex finding entire line from paragraph

Question:

I need to find the actual line from the paragraph, and the paragraph drawing by the markdown editor you can add a checkbox, radio, textbox, and paragraph through the editor.

The actual str is something like this,

  this is paragraph line1

  ?[question_2]{"category":[]}[who are you]=[] {1,2}
  [] OPTION 1
  [] OPTION 2
  [] OPTION 3

  this is paragraph line3

  ?[question_3]{"category":[]}[picode]=_ [PLACEHOLDER_TEXT]

  ?[question_1]{"category":[]}[sex]=() {1}
  () male 
  () femele 


  this is paragraph line3

Do all question types have to start with ?[sometext], so I can use this regex

radio -> [?] ?[([0-9a-z_].?)]({(?:[^{}]|?)})? ?[?(.?)]? ?[=] ?() ?({[0-9,]+})?([^]?)(nn|^n)

checkbox -> [?] ?[([0-9a-z_].?)]({(?:[^{}]|?)})? ?[?(.?)]? ?[=] ?[] ?({[0-9,]+})?([^]?)(nn|^n),

similar to all inputs, my question is how can I get the paragraph line text (those do not start with ? [] which may have small/caps/digits include)

Asked By: Selva Ganapathi

||

Answers:

You can do it like this:

^(?!(?)|([)).+(n|$)

^ will get you the start of the line, (?! will look ahead for the ? or [ characters, .+ will match the rest of the characters until n (Line break) or $ (end of file).

regex101 demo

EDIT

To group all the lines from a paragraph, try this:

^((?!(?)|([))(.+(n|$)))+

regex101 demo

Answered By: hug

here are some key ideas when handling string line detection.

  • some OS and browsers use different line ends. (not just n. it’s sometimes r or rn)
  • ^ does not tell the start of the line. it tells the start of the entire string.

use plain .split(). + regex works well in this case. here’s a good example that I’ve found on SO.

https://stackoverflow.com/questions/21895233/how-to-split-string-with-newline-n-in-node
function splitLines(t) { return t.split(/rn|r|n/); }

const lines = splitLines(yourString).forEach(line => radioOrCheckBoxRegex(line));

Markdown is hard(dangerous)

Sorry to deviate from regex but I advise you not to implement markdown by scratch. You’ll have endless extreme cases. Because

  • if it accepts user input, it requires some degree of sanitization.
  • users expectation is always different from your expectation. they’ll put so many weird strings to markdown and expect your code to work. battle-tested markdown libs come in handy in this case.
  • customizing markdown libs can teach you about the markdown parsing process.’
  • markdown is also a documented complex standard. follow existing markdown rules before actually start implementing one. are you following Github markdown rules? or are you following Commonmark rules? or maybe something else? and users will also expect your app to follow any of these rules.

Implementing custom markdown via classic markdown libs such as marked.js might be a better idea. most markdown libs use an internal tokenizer, which handles the regex you’ve mentioned. a plain text becomes program recognizable data(json, javascript obj) with tokenizers. it is possible to just use internal tokenizers, separate from the actual renderer.

check this lexer(tokenizer) code of marked.js

Answered By: sungryeol