Skip to content Skip to sidebar Skip to footer

Sed Command To Extract Text From HTML

I am grabbing the source of a page useing curl, and want to extract a text from a specific tag. the text is between the unique tag: href='http://www.website.com/some/unique/page.ph

Solution 1:

Assuming your desired output is just TEXT, this will work with the input you gave:

sed 's/^.*>\([^<]*\)<.*$/\1/'

If the only output you want is TEXT and you only want that to be output from a URL containing the word unique in it's path then use this instead:

sed -n '/http:.*\/unique\//s/^.*>\([^<]*\)<.*$/\1/p'

Post a Comment for "Sed Command To Extract Text From HTML"