Search Engine Optimization

Spread the love

Search Engine Optimization

Search engine optimization (SEO) is the process of affecting the online visibility of a website or a web page in a web search engine’s unpaid results—often referred to as “natural”, “organic”, or “earned” results. In general, the earlier (or higher ranked on the search results page), and more frequently a website appears in the search results list, the more visitors it will receive from the search engine’s users; these visitors can then be converted into customers. SEO may target different kinds of search, including image search, video search, academic search, news search, and industry-specific vertical search engines. SEO differs from local search engine optimization in that the latter is focused on optimizing a business’ online presence so that its web pages will be displayed by search engines when a user enters a local search for its products or services. The former instead is more focused on national or international searches.

As an Internet marketing strategy, SEO considers how search engines work, the computer programmed algorithms which dictate search engine behavior, what people search for, the actual search terms or keywords typed into search engines, and which search engines are preferred by their targeted audience. Optimizing a website may involve editing its content, adding content, doing HTML, and associated coding to both increase its relevance to specific keywords and to remove barriers to the indexing activities of search engines. Promoting a site to increase the number of backlinks, or inbound links, is another SEO tactic. By May 2015, mobile search had surpassed desktop search. In 2015, it was reported that Google is developing and promoting mobile search as a key feature within future products. In response, many brands are beginning to take a different approach to their Internet marketing strategies.

History

Webmasters and content providers began optimizing websites for search engines in the mid-1990s, as the first search engines were cataloging the early Web. Initially, all webmasters only needed to submit the address of a page, or URL, to the various engines which would send a “spider” to “crawl” that page, extract links to other pages from it, and return information found on the page to be indexed. The process involves a search engine spider downloading a page and storing it on the search engine’s own server. A second program, known as an indexer, extracts information about the page, such as the words it contains, where they are located, and any weight for specific words, as well as all links the page contains. All of this information is then placed into a scheduler for crawling at a later date.

Website owners recognized the value of a high ranking and visibility in search engine results, creating an opportunity for both white hat and black hat SEO practitioners. According to industry analyst Danny Sullivan, the phrase “search engine optimization” probably came into use in 1997. Sullivan credits Bruce Clay as one of the first people to popularize the term. On May 2, 2007,Jason Gambert attempted to trademark the term SEO by convincing the Trademark Office in Arizona that SEO is a “process” involving manipulation of keywords and not a “marketing service.”

Early versions of search algorithms relied on webmaster-provided information such as the keyword meta tag or index files in engines like ALIWEB. Meta tags provide a guide to each page’s content. Using metadata to index pages was found to be less than reliable, however, because the webmaster’s choice of keywords in the meta tag could potentially be an inaccurate representation of the site’s actual content. Inaccurate, incomplete, and inconsistent data in meta tags could and did cause pages to rank for irrelevant searches.[dubious – discuss] Web content providers also manipulated some attributes within the HTML source of a page in an attempt to rank well in search engines. By 1997, search engine designers recognized that webmasters were making efforts to rank well in their search engine, and that some webmasters were even manipulating their rankings in search results by stuffing pages with excessive or irrelevant keywords. Early search engines, such as Altavista and Infoseek, adjusted their algorithms to prevent webmasters from manipulating rankings.

By relying so much on factors such as keyword density which were exclusively within a webmaster’s control, early search engines suffered from abuse and ranking manipulation. To provide better results to their users, search engines had to adapt to ensure their results pages showed the most relevant search results, rather than unrelated pages stuffed with numerous keywords by unscrupulous webmasters. This meant moving away from heavy reliance on term density to a more holistic process for scoring semantic signals. Since the success and popularity of a search engine is determined by its ability to produce the most relevant results to any given search, poor quality or irrelevant search results could lead users to find other search sources. Search engines responded by developing more complex ranking algorithms, taking into account additional factors that were more difficult for webmasters to manipulate. In 2005, an annual conference, AIRWeb, Adversarial Information Retrieval on the Web was created to bring together practitioners and researchers concerned with search engine optimization and related topics.

Companies that employ overly aggressive techniques can get their client websites banned from the search results. In 2005, the Wall Street Journal reported on a company, Traffic Power, which allegedly used high-risk techniques and failed to disclose those risks to its clients. Wired magazine reported that the same company sued blogger and SEO Aaron Wall for writing about the ban. Google’s Matt Cutts later confirmed that Google did in fact ban Traffic Power and some of its clients.

Some search engines have also reached out to the SEO industry, and are frequent sponsors and guests at SEO conferences, webchats, and seminars. Major search engines provide information and guidelines to help with website optimization. Google has a Sitemaps program to help webmasters learn if Google is having any problems indexing their website and also provides data on Google traffic to the website. Bing Webmaster Tools provides a way for webmasters to submit a sitemap and web feeds, allows users to determine the “crawl rate”, and track the web pages index status.

Relationship with Google

In 1998, two graduate students at Stanford University, Larry Page and Sergey Brin, developed “Backrub”, a search engine that relied on a mathematical algorithm to rate the prominence of web pages. The number calculated by the algorithm, PageRank, is a function of the quantity and strength of inbound links. PageRank estimates the likelihood that a given page will be reached by a web user who randomly surfs the web, and follows links from one page to another. In effect, this means that some links are stronger than others, as a higher PageRank page is more likely to be reached by the random web surfer.

Page and Brin founded Google in 1998. Google attracted a loyal following among the growing number of Internet users, who liked its simple design. Off-page factors (such as PageRank and hyperlink analysis) were considered as well as on-page factors (such as keyword frequency, meta tags, headings, links and site structure) to enable Google to avoid the kind of manipulation seen in search engines that only considered on-page factors for their rankings. Although PageRank was more difficult to game, webmasters had already developed link building tools and schemes to influence the Inktomi search engine, and these methods proved similarly applicable to gaming PageRank. Many sites focused on exchanging, buying, and selling links, often on a massive scale. Some of these schemes, or link farms, involved the creation of thousands of sites for the sole purpose of link spamming.

By 2004, search engines had incorporated a wide range of undisclosed factors in their ranking algorithms to reduce the impact of link manipulation. In June 2007, The New York Times’ Saul Hansell stated Google ranks sites using more than 200 different signals. The leading search engines, Google, Bing, and Yahoo, do not disclose the algorithms they use to rank pages. Some SEO practitioners have studied different approaches to search engine optimization, and have shared their personal opinions. Patents related to search engines can provide information to better understand search engines. In 2005, Google began personalizing search results for each user. Depending on their history of previous searches, Google crafted results for logged in users.

In 2007, Google announced a campaign against paid links that transfer PageRank. On June 15, 2009, Google disclosed that they had taken measures to mitigate the effects of PageRank sculpting by use of the nofollow attribute on links. Matt Cutts, a well-known software engineer at Google, announced that Google Bot would no longer treat nofollowed links in the same way, to prevent SEO service providers from using nofollow for PageRank sculpting. As a result of this change the usage of nofollow led to evaporation of PageRank. In order to avoid the above, SEO engineers developed alternative techniques that replace nofollowed tags with obfuscated Javascript and thus permit PageRank sculpting. Additionally several solutions have been suggested that include the usage of iframes, Flash and Javascript.

In December 2009, Google announced it would be using the web search history of all its users in order to populate search results. On June 8, 2010 a new web indexing system called Google Caffeine was announced. Designed to allow users to find news results, forum posts and other content much sooner after publishing than before, Google caffeine was a change to the way Google updated its index in order to make things show up quicker on Google than before. According to Carrie Grimes, the software engineer who announced Caffeine for Google, “Caffeine provides 50 percent fresher results for web searches than our last index…” Google Instant, real-time-search, was introduced in late 2010 in an attempt to make search results more timely and relevant. Historically site administrators have spent months or even years optimizing a website to increase search rankings. With the growth in popularity of social media sites and blogs the leading engines made changes to their algorithms to allow fresh content to rank quickly within the search results.

In February 2011, Google announced the Panda update, which penalizes websites containing content duplicated from other websites and sources. Historically websites have copied content from one another and benefited in search engine rankings by engaging in this practice. However Google implemented a new system which punishes sites whose content is not unique. The 2012 Google Penguin attempted to penalize websites that used manipulative techniques to improve their rankings on the search engine. Although Google Penguin has been presented as an algorithm aimed at fighting web spam, it really focuses on spammy links by gauging the quality of the sites the links are coming from. The 2013 Google Hummingbird update featured an algorithm change designed to improve Google’s natural language processing and semantic understanding of web pages. Hummingbird’s language processing system falls under the newly recognised term of ‘Conversational Search’ where the system pays more attention to each word in the query in order to better match the pages to the meaning of the query rather than a few words . With regards to the changes made to search engine optimization, for content publishers and writers, Hummingbird is intended to resolve issues by getting rid of irrelevant content and spam, allowing Google to produce high-quality content and rely on them to be ‘trusted’ authors.

Methods

Search engines use complex mathematical algorithms to interpret which websites a user seeks. In this diagram, if each bubble represents a website, programs sometimes called spiders examine which sites link to which other sites, with arrows representing these links. Websites getting more inbound links, or stronger links, are presumed to be more important and what the user is searching for. In this example, since website B is the recipient of numerous inbound links, it ranks more highly in a web search. And the links “carry through”, such that website C, even though it only has one inbound link, has an inbound link from a highly popular site (B) while site E does not. Note: Percentages are rounded.
The leading search engines, such as Google, Bing and Yahoo!, use crawlers to find pages for their algorithmic search results. Pages that are linked from other search engine indexed pages do not need to be submitted because they are found automatically. The Yahoo! Directory and DMOZ, two major directories which closed in 2014 and 2017 respectively, both required manual submission and human editorial review. Google offers Google Search Console, for which an XML Sitemap feed can be created and submitted for free to ensure that all pages are found, especially pages that are not discoverable by automatically following links in addition to their URL submission console. Yahoo! formerly operated a paid submission service that guaranteed crawling for a cost per click; however, this practice was discontinued in 2009.

Search engine crawlers may look at a number of different factors when crawling a site. Not every page is indexed by the search engines. Distance of pages from the root directory of a site may also be a factor in whether or not pages get crawled.

Today, most people are searching on Google using a mobile device. In November 2016, Google announced a major change to the way crawling websites and started to make their index mobile-first, which means the mobile version of your website becomes the starting point for what Google includes in their index.

Preventing crawling

To avoid undesirable content in the search indexes, webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine’s database by using a meta tag specific to robots (usually <meta name=”robots” content=”noindex”> ). When a search engine visits a site, the robots.txt located in the root directory is the first file crawled. The robots.txt file is then parsed and will instruct the robot as to which pages are not to be crawled. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wish crawled. Pages typically prevented from being crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches. In March 2007, Google warned webmasters that they should prevent indexing of internal search results because those pages are considered search spam.

Increasing prominence

A variety of methods can increase the prominence of a webpage within the search results. Cross linking between pages of the same website to provide more links to important pages may improve its visibility. Writing content that includes frequently searched keyword phrase, so as to be relevant to a wide variety of search queries will tend to increase traffic. Updating content so as to keep search engines crawling back frequently can give additional weight to a site. Adding relevant keywords to a web page’s meta data, including the title tag and meta description, will tend to improve the relevancy of a site’s search listings, thus increasing traffic. URL canonicalization of web pages accessible via multiple URLs, using the canonical link element or via 301 redirects can help make sure links to different versions of the URL all count towards the page’s link popularity score.

White hat versus black hat techniques

SEO techniques can be classified into two broad categories: techniques that search engine companies recommend as part of good design (“white hat”), and those techniques of which search engines do not approve (“black hat”). The search engines attempt to minimize the effect of the latter, among them spamdexing. Industry commentators have classified these methods, and the practitioners who employ them, as either white hat SEO, or black hat SEO. White hats tend to produce results that last a long time, whereas black hats anticipate that their sites may eventually be banned either temporarily or permanently once the search engines discover what they are doing.

An SEO technique is considered white hat if it conforms to the search engines’ guidelines and involves no deception. As the search engine guidelines are not written as a series of rules or commandments, this is an important distinction to note. White hat SEO is not just about following guidelines, but is about ensuring that the content a search engine indexes and subsequently ranks is the same content a user will see. White hat advice is generally summed up as creating content for users, not for search engines, and then making that content easily accessible to the online “spider” algorithms, rather than attempting to trick the algorithm from its intended purpose. White hat SEO is in many ways similar to web development that promotes accessibility, although the two are not identical.

Black hat SEO attempts to improve rankings in ways that are disapproved of by the search engines, or involve deception. One black hat technique uses text that is hidden, either as text colored similar to the background, in an invisible div, or positioned off screen. Another method gives a different page depending on whether the page is being requested by a human visitor or a search engine, a technique known as cloaking. Another category sometimes used is grey hat SEO. This is in between black hat and white hat approaches, where the methods employed avoid the site being penalized, but do not act in producing the best content for users. Grey hat SEO is entirely focused on improving search engine rankings.

Search engines may penalize sites they discover using black hat methods, either by reducing their rankings or eliminating their listings from their databases altogether. Such penalties can be applied either automatically by the search engines’ algorithms, or by a manual site review. One example was the February 2006 Google removal of both BMW Germany and Ricoh Germany for use of deceptive practices. Both companies, however, quickly apologized, fixed the offending pages, and were restored to Google’s search engine results page.

As marketing strategy

SEO is not an appropriate strategy for every website, and other Internet marketing strategies can be more effective, such as paid advertising through pay per click (PPC) campaigns, depending on the site operator’s goals. Search engine marketing (SEM) is practice of designing, running and optimizing search engine ad campaigns. Its difference from SEO is most simply depicted as the difference between paid and unpaid priority ranking in search results. Its purpose regards prominence more so than relevance; website developers should regard SEM with the utmost importance with consideration to visibility as most navigate to the primary listings of their search. A successful Internet marketing campaign may also depend upon building high quality web pages to engage and persuade, setting up analytics programs to enable site owners to measure results, and improving a site’s conversion rate. In November 2015, Google released a full 160 page version of its Search Quality Rating Guidelines to the public, which revealed a shift in their focus towards “usefulness” and mobile search. In recent years the mobile market has exploded, overtaking the use of desktops, as shown in by StatCounter in October 2016 where they analysed 2.5 million websites and found that 51.3% of the pages were loaded by a mobile device . Google has been one of the companies that is utilising the popularity of mobile usage by encouraging websites to use their Google Search Console, the Mobile-Friendly Test, which allows companies to measure up their website to the search engine results and how user-friendly it is.

SEO may generate an adequate return on investment. However, search engines are not paid for organic search traffic, their algorithms change, and there are no guarantees of continued referrals. Due to this lack of guarantees and certainty, a business that relies heavily on search engine traffic can suffer major losses if the search engines stop sending visitors. Search engines can change their algorithms, impacting a website’s placement, possibly resulting in a serious loss of traffic. According to Google’s CEO, Eric Schmidt, in 2010, Google made over 500 algorithm changes – almost 1.5 per day. It is considered a wise business practice for website operators to liberate themselves from dependence on search engine traffic. In addition to accessibility in terms of web crawlers (addressed above), user web accessibility has become increasingly important for SEO.

International markets

Optimization techniques are highly tuned to the dominant search engines in the target market. The search engines’ market shares vary from market to market, as does competition. In 2003, Danny Sullivan stated that Google represented about 75% of all searches. In markets outside the United States, Google’s share is often larger, and Google remains the dominant search engine worldwide as of 2007. As of 2006, Google had an 85–90% market share in Germany. While there were hundreds of SEO firms in the US at that time, there were only about five in Germany. As of June 2008, the market share of Google in the UK was close to 90% according to Hitwise. That market share is achieved in a number of countries.

As of 2009, there are only a few large markets where Google is not the leading search engine. In most cases, when Google is not leading in a given market, it is lagging behind a local player. The most notable example markets are China, Japan, South Korea, Russia and the Czech Republic where respectively Baidu, Yahoo! Japan, Naver, Yandex and Seznam are market leaders.

Successful search optimization for international markets may require professional translation of web pages, registration of a domain name with a top level domain in the target market, and web hosting that provides a local IP address. Otherwise, the fundamental elements of search optimization are essentially the same, regardless of language.

Legal precedents

On October 17, 2002, SearchKing filed suit in the United States District Court, Western District of Oklahoma, against the search engine Google. SearchKing’s claim was that Google’s tactics to prevent spamdexing constituted a tortious interference with contractual relations. On May 27, 2003, the court granted Google’s motion to dismiss the complaint because SearchKing “failed to state a claim upon which relief may be granted.”

In March 2006, KinderStart filed a lawsuit against Google over search engine rankings. KinderStart’s website was removed from Google’s index prior to the lawsuit and the amount of traffic to the site dropped by 70%. On March 16, 2007 the United States District Court for the Northern District of California (San Jose Division) dismissed KinderStart’s complaint without leave to amend, and partially granted Google’s motion for Rule 11 sanctions against KinderStart’s attorney, requiring him to pay part of Google’s legal expenses.

source

Web portal

Spread the love

Web portal

A web portal is a specially designed website that brings information from diverse sources, like emails, online forums and search engines, together in a uniform way. Usually, each information source gets its dedicated area on the page for displaying information (a portlet); often, the user can configure which ones to display. Variants of portals include mashups and intranet “dashboards” for executives and managers. The extent to which content is displayed in a “uniform way” may depend on the intended user and the intended purpose, as well as the diversity of the content. Very often design emphasis is on a certain “metaphor” for configuring and customizing the presentation of the content (e.g., a dashboard or map) and the chosen implementation framework or code libraries. In addition, the role of the user in an organization may determine which content can be added to the portal or deleted from the portal configuration.

A portal may use a search engine’s application programming interface (API) to permit users to search intranet content as opposed to extranet content by restricting which domains may be searched. Apart from this common search engines feature, web portals may offer other services such as e-mail, news, stock quotes, information from databases and even entertainment content. Portals provide a way for enterprises and organizations to provide a consistent “look and feel” with access control and procedures for multiple applications and databases, which otherwise would have been different web entities at various URLs. The features available may be restricted by whether access is by an authorized and authenticated user (employee, member) or an anonymous website visitor.

Examples of early public web portals were AOL, Excite, Netvibes, iGoogle, MSN, Naver, Lycos, Prodigy, Indiatimes, Rediff, and Yahoo!. See for example, the “My Yahoo!” feature of Yahoo! that may have inspired such features as the later Google “iGoogle” (discontinued as of November 1, 2013.) The configurable side-panels of, for example, the modern Opera browser and the option of “speed dial” pages by most browsers continue to reflect the earlier “portal” metaphor.

In the late 1990s the Web portal was a Web IT buzzword. After the proliferation of Web browsers in the late-1990s many companies tried to build or acquire a portal to attempt to obtain a share of an Internet market. The Web portal gained special attention because it was, for many users, the starting point of their Web browsing if it was set as their home page. The content and branding of a portal could change as Internet companies merged or were acquired. Netscape became a part of America Online, the Walt Disney Company launched Go.com, IBM and others launched Prodigy (-only users.) Portal metaphors are widely used by public library sites for borrowers using a login as users and by university intranets for students and for faculty. Vertical markets remain for ISV’s (Independent Software Vendors) offering management and executive intranet “dashboards” for corporations and government agencies in areas such as GRC and risk management.

Classification

Web portals are sometimes classified as horizontal or vertical. A horizontal portal is used as a platform to several companies in the same economic sector or to the same type of manufacturers or distributors. A vertical portal (also known as a “vortal”) is a specialized entry point to a specific market or industry niche, subject area, or interest. Some vertical portals are known as “vertical information portals” (VIPs). VIPs provide news, editorial content, digital publications, and e-commerce capabilities. In contrast to traditional vertical portals, VIPs also provide dynamic multimedia applications including social networking, video posting, and blogging.

Types

A personal portal is a Web Page at a Web site on the World Wide Web or a local HTML home page including JavaScript and perhaps running in a modified Web browser. A personal portal typically provides personalized capabilities to its visitors or its local user, providing a pathway to other content. It may be designed to use distributed applications, different numbers and types of middleware and hardware to provide services from a number of different sources and may run on a non-standard local Web server. In addition, business portals can be designed for sharing and collaboration in workplaces. A further business-driven requirement of portals is that the content be presented on multiple platforms such as personal computers, laptops, tablet computers, personal digital assistants (PDAs), cell phones and smartphones.

Information, news, and updates are examples of content that could be delivered through such a portal. Personal portals can be related to any specific topic such as providing friends information on a social network or providing links to outside content that may help others beyond your reach of services. Portals are not limited to simply providing links. Outside of business intranet user, very often simpler portals become replaced with richer mashup designs. Within enterprises, early portals were often replaced by much more powerful “dashboard” designs. Some also have relied on newer protocols such as some version of RSS aggregation and may or may not involve some degree of Web harvesting. Facebook can be considered as a modern personal web portal.

Cultural

Cultural portals aggregate digitised cultural collections of galleries, libraries (see: library portal), archives and museums. This type of portal provides a point of access to invisible Web cultural content that may not be indexed by standard search engines. Digitised collections can include scans or digital photos of books, artworks, photography, journals, newspapers, maps, diaries and letters and digital files of music, sound recordings, films, and archived websites as well as the descriptive metadata associated with each type of cultural work (e.g., metadata provides information about the author, publisher, etc.). These portals are often based around a specific national or regional groupings of institutions. Examples of cultural portals include:

  • DigitalNZ – A cultural portal led by the National Library of New Zealand focused on New Zealand digital content.
  • Europeana – A cultural portal for the European Union based in the National Library of the Netherlands and overseen by the Europeana Foundation.
  • Trove – A cultural portal led by the National Library of Australia focused on Australian content.
  • TUT.by – A commercial cultural portal focused on Belarusian digital content.
  • Digital Public Library of America (in development)

Corporate

Corporate intranets became common during the 1990s. As intranets grew in size and complexity, organization webmasters were faced with increasing content and user management challenges. A consolidated view of company information was judged insufficient; users wanted personalization and customization. Webmasters, if skilled enough, were able to offer some capabilities, but for the most part ended up driving users away from using the intranet. Many companies began to offer tools to help webmasters manage their data, applications and information more easily, and by providing different users with personalized views. Portal solutions can also include workflow management, collaboration between work groups or branches, and policy-managed content publication. Most can allow internal and external access to specific corporate information using secure authentication or single sign-on.

JSR168 Standards emerged around 2001. Java Specification Request (JSR) 168 standards allow the interoperability of portlets across different portal platforms. These standards allow portal developers, administrators and consumers to integrate standards-based portals and portlets across a variety of vendor solutions. The concept of content aggregation seems to still gain momentum and portal solution will likely continue to evolve significantly over the next few years. The Gartner Group predicts generation 8 portals to expand on the Business Mashups concept of delivering a variety of information, tools, applications and access points through a single mechanism.[citation needed]

With the increase in user-generated content (blog posts, comments, photos), disparate data silos, and file formats, information architects and taxonomists will be required to allow users the ability to tag (classify) the data or content. For example, if a vice-president makes a blog post, this post could be tagged with her/his name, title, and the subject of the post. Tagging makes it easier for users of the intranet to find the content they are interested in. This will ultimately cause a ripple effect where users will also be generating ad hoc navigation and information flows. Corporate portals also offer customers and employees self-service opportunities. bandar ceme terbaik

Stock

Also known as stock-share portals, stock market portals or stock exchange portals are Web-based applications that facilitates the process of informing the share-holders with substantial online data such as the latest price, ask/bids, the latest News, reports and announcements. Some stock portals use online gateways through a central depository system (CDS) for the visitors (ram) to buy or sell their shares or manage their portfolio.

Search

Search portals aggregate results from several search engines into one page. You can find search portals specialized in a product, for example property search portals. Library search portals are also known as discovery interfaces.

Property search

Property search portals aggregate data about properties for sale by real estate agents. Examples in the UK include Zoopla, Rightmove, Nestoria and Nuroa. Examples in the US include Propertini.

Tender

A tender portal is a gateway for government suppliers to bid on providing goods and services. Tender portals allow users to search, modify, submit, review and archive data in order to provide a complete online tendering process.

Using online tendering, bidders can do any of the following:

  • Receive notification of the tenders.
  • Receive tender documents online.
  • Fill out the forms online.
  • Submit proposals and documents.
  • Submit bids online.

Hosted

Hosted Web portals gained popularity and a number of companies began offering them as a hosted service. The hosted portal market fundamentally changed the composition of portals. In many ways they served simply as a tool for publishing information instead of the loftier goals of integrating legacy applications or presenting correlated data from distributed databases. The early hosted portal companies such as Hyperoffice.com or the now defunct InternetPortal.com focused on collaboration and scheduling in addition to the distribution of corporate data. As hosted Web portals have risen in popularity their feature set has grown to include hosted databases, document management, email, discussion forums and more. Hosted portals automatically personalize the content generated from their modules to provide a personalized experience to their users. In this regard they have remained true to the original goals of the earlier corporate Web portals.

Emerging new classes of Internet portals called Cloud Portals are showcasing the power of API (Application Programming Interface) rich software systems leveraging SOA (service-oriented architecture, Web services, and custom data exchange) to accommodate machine to machine interaction creating a more fluid user experience for connecting users spanning multiple domains during a given “session”. Cloud portals like Nubifer Cloud Portal show what is possible using Enterprise Mashup and Web Service integration approaches to building cloud portals.

Domain-specific

A number of portals have come about which are specific to a particular domain, offering access to related companies and services; a prime example of this trend would be the growth in property portals that give access to services such as estate agents, removal firm, and solicitors that offer conveyancing. Along the same lines, industry-specific news and information portals have appeared, such as the clinical trials-specific portal.

Engineering aspects

The main concept is to present the user with a single Web page that brings together or aggregates content from a number of other systems or servers. The application server or architecture performs most of the crucial functions of the application. This application server is in turn connected to database servers, and may be part of a clustered server environment. High-capacity portal configurations may include load balancing strategies. For portals that present application functionality to the user, the portal server is in reality the front piece of a server configuration that includes some connectivity to the application server. For early Web browsers permitting HTML frameset and iframe elements, diverse information could be presented without violating the browser same-source security policy (relied upon to prevent a variety of cross-site security breaches). More recent client-side technologies rely on JavaScript frameworks and libraries that rely on more recent Web functionality such as WebSockets and asynchronous callbacks using XMLHttpRequests.

The server hosting the portal may only be a “pass through” for the user. By use of portlets, application functionality can be presented in any number of portal pages. For the most part, this architecture is transparent to the user. In such a design, security and concurrent user capacity can be important issues, and security designers need to ensure that only authenticated and authorized users can generate requests to the application server. If the security design and administration does not ensure adequate authentication and authorization, then the portal may inadvertently present vulnerabilities to various types of attacks.

source

Strongly connected component

Spread the love

Strongly connected component

In the mathematical theory of directed graphs, a graph is said to be strongly connected or diconnected if every vertex is reachable from every other vertex. The strongly connected components or diconnected components of an arbitrary directed graph form a partition into subgraphs that are themselves strongly connected. It is possible to test the strong connectivity of a graph, or to find its strongly connected components, in linear time (that is, Θ(V+E)).

A directed graph is called strongly connected if there is a path in each direction between each pair of vertices of the graph. That is, a path exists from the first vertex in the pair to the second, and another path exists from the second vertex to the first. In a directed graph G that may not itself be strongly connected, a pair of vertices u and v are said to be strongly connected to each other if there is a path in each direction between them.

The binary relation of being strongly connected is an equivalence relation, and the induced quotientgraphs of its equivalence classes are called strongly connected components. Equivalently, a strongly connected component of a directed graph G is a subgraph that is strongly connected, and is maximal with this property: no additional edges or vertices from G can be included in the subgraph without breaking its property of being strongly connected. The collection of strongly connected components forms a partition of the set of vertices of G.

The yellow directed acyclic graph is the condensation of the blue directed graph. It is formed by contracting each strongly connected component of the blue graph into a single yellow vertex.
If each strongly connected component is contracted to a single vertex, the resulting graph is a directed acyclic graph, the condensation of G. A directed graph is acyclic if and only if it has no strongly connected subgraphs with more than one vertex, because a directed cycle is strongly connected and every nontrivial strongly connected component contains at least one directed cycle.

Algorithms

DFS-based linear-time algorithms
Several algorithms based on depth first search compute strongly connected components in linear time.

Kosaraju’s algorithm uses two passes of depth first search. The first, in the original graph, is used to choose the order in which the outer loop of the second depth first search tests vertices for having been visited already and recursively explores them if not. The second depth first search is on the transpose graph of the original graph, and each recursive exploration finds a single new strongly connected component. It is named after S. Rao Kosaraju, who described it (but did not publish his results) in 1978; Micha Sharir later published it in 1981.
Tarjan’s strongly connected components algorithm, published by Robert Tarjan in 1972, performs a single pass of depth first search. It maintains a stack of vertices that have been explored by the search but not yet assigned to a component, and calculates “low numbers” of each vertex (an index number of the highest ancestor reachable in one step from a descendant of the vertex) which it uses to determine when a set of vertices should be popped off the stack into a new component. poker online
The path-based strong component algorithm uses a depth first search, like Tarjan’s algorithm, but with two stacks. One of the stacks is used to keep track of the vertices not yet assigned to components, while the other keeps track of the current path in the depth first search tree. The first linear time version of this algorithm was published by Edsger W. Dijkstra in 1976.
Although Kosaraju’s algorithm is conceptually simple, Tarjan’s and the path-based algorithm require only one depth-first search rather than two.

Reachability-based Algorithms

Previous linear-time algorithms are based on depth-first search which is generally considered hard to parallelize. Fleischer et al. in 2000 proposed a divide-and-conquer approach based on reachability queries, and such algorithms are usually called reachability-based SCC algorithms. The idea of this approach is to pick a random pivot vertex and apply forward and backward reachability queries from this vertex. The two queries partition the vertex set into 4 subsets: vertices reached by both, either one, or none of the searches. One can show that a strongly connected component has to be contained in one of the subsets. The vertex subset reached by both searches forms a strongly connected components, and the algorithm then recurses on the other 3 subsets.

The expected sequential running time of this algorithm is shown to be O(n log n), a factor of O(log n) more than the classic algorithms. The parallelism comes from: (1) the reachability queries can be parallelized more easily (e.g. by a BFS, and it can be fast if the diameter of the graph is small); and (2) the independence between the subtasks in the divide-and-conquer process. This algorithm performs well on real-world graphs, but does not have theoretical guarantee on the parallelism (consider if a graph has no edges, the algorithm requires O(n) levels of recursions).

Blelloch et al. in 2016 shows that if the reachability queries are applied in a random order, the cost bound of O(n log n) still holds. Furthermore, the queries then can be batched in a prefix-doubling manner (i.e. 1, 2, 4, 8 queries) and run simultaneously in one round. The overall span of this algorithm is log2 n reachability queries, which is probably the optimal parallelism that can be achieved using the reachability-based approach.

Applications

Algorithms for finding strongly connected components may be used to solve 2-satisfiability problems (systems of Boolean variables with constraints on the values of pairs of variables): as Aspvall, Plass & Tarjan (1979) showed, a 2-satisfiability instance is unsatisfiable if and only if there is a variable v such that v and its complement are both contained in the same strongly connected component of the implication graph of the instance.

Strongly connected components are also used to compute the Dulmage–Mendelsohn decomposition, a classification of the edges of a bipartite graph, according to whether or not they can be part of a perfect matching in the graph.

Related results

A directed graph is strongly connected if and only if it has an ear decomposition, a partition of the edges into a sequence of directed paths and cycles such that the first subgraph in the sequence is a cycle, and each subsequent subgraph is either a cycle sharing one vertex with previous subgraphs, or a path sharing its two endpoints with previous subgraphs.

According to Robbins’ theorem, an undirected graph may be oriented in such a way that it becomes strongly connected, if and only if it is 2-edge-connected. One way to prove this result is to find an ear decomposition of the underlying undirected graph and then orient each ear consistently.

Server farm

Spread the love

Server farm

A server farm or server cluster is a collection of computer servers – usually maintained by an organization to supply server functionality far beyond the capability of a single machine. Server farms often consist of thousands of computers which require a large amount of power to run and to keep cool. At the optimum performance level, a server farm has enormous costs (both financial and environmental) associated with it. Server farms often have backup servers, which can take over the function of primary servers in the event of a primary-server failure. Server farms are typically collocated with the network switches and/or routers which enable communication between the different parts of the cluster and the users of the cluster. Server farmers typically mount the computers, routers, power supplies, and related electronics on 19-inch racks in a server room or data center.

Server farms are commonly used for cluster computing. Many modern supercomputers comprise giant server farms of high-speed processors connected by either Gigabit Ethernet or custom interconnects such as Infiniband or Myrinet. Web hosting is a common use of a server farm; such a system is sometimes collectively referred to as a web farm. Other uses of server farms include scientific simulations (such as computational fluid dynamics) and the rendering of 3D computer generated imagery (also see render farm).

Server farms are increasingly being used instead of or in addition to mainframe computers by large enterprises, although server farms do not yet reach the same reliability levels as mainframes. Because of the sheer number of computers in large server farms, the failure of an individual machine is a commonplace event, and the management of large server farms needs to take this into account by providing support for redundancy, automatic failover, and rapid reconfiguration of the server cluster.

The performance of the largest server farms (thousands of processors and up) is typically limited by the performance of the data center’s cooling systems and the total electricity cost rather than by the performance of the processors. Computers in server farms run 24/7 and consume large amounts of electricity, for this reason, the critical design parameter for both large and continuous systems tends to be performance per watt rather than cost of peak performance or (peak performance / (unit * initial cost)). Also, for high availability systems that must run 24/7 (unlike supercomputers that can be power-cycled to demand, and also tend to run at much higher utilizations), there is more attention placed on power saving features such as variable clock-speed and the ability to turn off both computer parts, processor parts, and entire computers (WoL and virtualization) according to demand without bringing down services.

Performance per watt

The EEMBC EnergyBench, SPECpower, and the Transaction Processing Performance Council TPC-Energy are benchmarks designed to predict performance per watt in a server farm. The power used by each rack of equipment can be measured at the power distribution unit. Some servers include power tracking hardware so the people running the server farm can measure the power used by each server. The power used by the entire server farm may be reported in terms of power usage effectiveness or data center infrastructure efficiency.

According to some estimates, for every 100 watts spent on running the servers, roughly another 50 watts is needed to cool them. For this reason, the siting of a Server Farm can be as important as processor selection in achieving power efficiency. Iceland, which has a cold climate all year as well as cheap and carbon-neutral geothermal electricity supply, is building its first major server farm hosting site. Fibre optic cables are being laid from Iceland to North America and Europe to enable companies there to locate their servers in Iceland. Other countries with favorable conditions, such as Canada, Finland, Sweden and Switzerland, are trying to attract cloud computing data centers. In these countries, heat from the servers can be cheaply vented or used to help heat buildings, thus reducing the energy consumption of conventional heaters. 99bandar

Spam in blogs

Spam in blogs (also called simply blog spam, comment spam, or social spam) is a form of spamdexing. (Note that blogspam also has another meaning, namely the post of a blogger who creates posts that have no added value to them in order to submit them to other sites.) It is done by posting (usually automatically) random comments, copying material from elsewhere that is not original, or promoting commercial services to blogs, wikis, guestbooks, or other publicly accessible online discussion boards. Any web application that accepts and displays hyperlinks submitted by visitors may be a target.

Adding links that point to the spammer’s web site artificially increases the site’s search engine ranking on those where the popularity of the URL contributes to its implied value, an example algorithm would be the PageRank algorithm as used by Google Search. An increased ranking often results in the spammer’s commercial site being listed ahead of other sites for certain searches, increasing the number of potential visitors and paying customers.

This type of spam originally appeared in Internet guestbooks, where spammers repeatedly filled a guestbook with links to their own site and with no relevant comment, to increase search engine rankings. If an actual comment is given it is often just “cool page”, “nice website”, or keywords of the spammed link.

In 2003, spammers began to take advantage of the open nature of comments in the blogging software like Movable Type by repeatedly placing comments to various blog posts that provided nothing more than a link to the spammer’s commercial web site. Jay Allen created a free plugin, called MT-BlackList, for the Movable Type weblog tool (versions prior to 3.2) that attempted to alleviate this problem. Many blogging packages now have methods of preventing or reducing the effect of blog spam built in due to its prevalence, although spammers too have developed tools to circumvent them. Many spammers use special blog spamming tools like trackback submitter to bypass comment spam protection on popular blogging systems like Movable Type, WordPress, and others.

Other phrases typically used in the comment content can be stolen comments from other websites, “nice article”, something about their imaginary friends, plagiarised parts from books, unfinished sentences, nonsense words (usually to defeat a minimum comment length restriction) or the same link repeated.

Application-specific implementations

Particularly popular software products such as Movable Type and MediaWiki have developed or included anti-spam measures, as spammers focus more attention on targeting those platforms due to their prevalence on the Internet. Whitelists and blacklists that prevent certain IPs from posting, or that prevent people from posting content that matches certain filters, are common defences although most software tends to use a combination of the variety of different techniques documented below.

The goal in every potential solution is to allow legitimate users to continue to comment (and often even add links to their comments, as that is considered by some to be a valuable aspect of any comments section when the links are relevant or related to the article or content) whilst preventing all link spam or irrelevant comments from ever being viewable to the site’s owner and visitors.

Scraper Site

Spread the love

Scraper site

A scraper site is a website that copies content from other websites using web scraping. The content is then mirrored with the goal of creating revenue, usually through advertising and sometimes by selling user data. Scraper sites come in various forms. Some provide little, if any material or information, and are intended to obtain user information such as e-mail addresses, to be targeted for spam e-mail. Price aggregation and shopping sites access multiple listings of a product and allow a user to rapidly compare the prices.

Examples of scraper websites

Search engines such as Google could be considered a type of scraper site. Search engines gather content from other websites, save it in their own databases, index it and present the scraped content to their search engine’s own users. The majority of content scraped by search engines is copyrighted.

The scraping technique has been used on various dating websites as well and they often combine it with facial recognition.

Scraping is also used on general image recognition websites, and websites specifically made to identify images of crops with pests and diseases

Made for advertising

Some scraper sites are created to make money by using advertising programs. In such case, they are called Made for AdSense sites or MFA. This derogatory term refers to websites that have no redeeming value except to lure visitors to the website for the sole purpose of clicking on advertisements.

Made for AdSense sites are considered search engine spam that dilute the search results with less-than-satisfactory search results. The scraped content is redundant to that which would be shown by the search engine under normal circumstances, had no MFA website been found in the listings.

Some scraper sites link to other sites to improve their search engine ranking through a private blog network. Prior to Google’s update to its search algorithm known as Panda, a type of scraper site known as an auto blog was quite common among black hat marketers who used a method known as spamdexing.

Legality

Scraper sites may violate copyright law. Even taking content from an open content site can be a copyright violation, if done in a way which does not respect the license. For instance, the GNU Free Documentation License (GFDL) and Creative Commons ShareAlike (CC-BY-SA) licenses used on Wikipedia require that a republisher of Wikipedia inform its readers of the conditions on these licenses, and give credit to the original author.

Techniques

This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (November 2007) (Learn how and when to remove this template message)

This section possibly contains original research. Please improve it by verifying the claims made and adding inline citations. Statements consisting only of original research should be removed. (September 2007) (Learn how and when to remove this template message)
Depending upon the objective of a scraper, the methods in which websites are targeted differ. For example, sites with large amounts of content such as airlines, consumer electronics, department stores, etc. might be routinely targeted by their competition just to stay abreast of pricing information. bandarq online

Another type of scraper will pull snippets and text from websites that rank high for keywords they have targeted. This way they hope to rank highly in the search engine results pages (SERPs), piggybacking on the original page’s page rank. RSS feeds are vulnerable to scrapers.

Other scraper sites consist of advertisements and paragraphs of words randomly selected from a dictionary. Often a visitor will click on a pay-per-click advertisement on such site because it is the only comprehensible text on the page. Operators of these scraper sites gain financially from these clicks. Advertising networks claim to be constantly working to remove these sites from their programs, although these networks benefit directly from the clicks generated at this kind of site. From the advertisers’ point of view, the networks don’t seem to be making enough effort to stop this problem.

Scrapers tend to be associated with link farms and are sometimes perceived as the same thing, when multiple scrapers link to the same target site. A frequent target victim site might be accused of link-farm participation, due to the artificial pattern of incoming links to a victim website, linked from multiple scraper sites.

Domain hijacking

This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (December 2015) (Learn how and when to remove this template message)
Some programmers who create scraper sites may purchase a recently expired domain name to reuse its SEO power in Google. Whole businesses focus on understanding all[citation needed] expired domains and utilising them for their historical ranking ability exist. Doing so will allow SEOs to utilize the already-established backlinks to the domain name. Some spammers may try to match the topic of the expired site or copy the existing content from the Internet Archive to maintain the authenticity of the site so that the backlinks don’t drop. For example, an expired website about a photographer may be re-registered to create a site about photography tips or use the domain name in their private blog network to power their own photography site.

Services at some expired domain name registration agents provide both the facility to find these expired domains and to gather the html that the domain name used to have on its web site.

Data scraping

Data scraping is a technique in which a computer program extracts data from human-readable output coming from another program.

Normally, data transfer between programs is accomplished using data structures suited for automated processing by computers, not people. Such interchange formats and protocols are typically rigidly structured, well-documented, easily parsed, and keep ambiguity to a minimum. Very often, these transmissions are not human-readable at all.

Thus, the key element that distinguishes data scraping from regular parsing is that the output being scraped is intended for display to an end-user, rather than as input to another program, and is therefore usually neither documented nor structured for convenient parsing. Data scraping often involves ignoring binary data (usually images or multimedia data), display formatting, redundant labels, superfluous commentary, and other information which is either irrelevant or hinders automated processing.

Data scraping is most often done either to interface to a legacy system, which has no other mechanism which is compatible with current hardware, or to interface to a third-party system which does not provide a more convenient API. In the second case, the operator of the third-party system will often see screen scraping as unwanted, due to reasons such as increased system load, the loss of advertisement revenue, or the loss of control of the information content.

Data scraping is generally considered an ad hoc, inelegant technique, often used only as a “last resort” when no other mechanism for data interchange is available. Aside from the higher programming and processing overhead, output displays intended for human consumption often change structure frequently. Humans can cope with this easily, but a computer program may report nonsense, have been told to read data in a particular format or from a particular place, and with no knowledge of how to check its results for validity.

Technical variants

A screen fragment and a screen-scraping interface (blue box with red arrow) to customize data capture process.
Screen scraping is normally associated with the programmatic collection of visual data from a source, instead of parsing data as in Web scraping. Originally, screen scraping referred to the practice of reading text data from a computer display terminal’s screen. This was generally done by reading the terminal’s memory through its auxiliary port, or by connecting the terminal output port of one computer system to an input port on another. The term screen scraping is also commonly used to refer to the bidirectional exchange of data. This could be the simple cases where the controlling program navigates through the user interface, or more complex scenarios where the controlling program is entering data into an interface meant to be used by a human.

As a concrete example of a classic screen scraper, consider a hypothetical legacy system dating from the 1960s—the dawn of computerized data processing. Computer to user interfaces from that era were often simply text-based dumb terminals which were not much more than virtual teleprinters (such systems are still in use today, for various reasons). The desire to interface such a system to more modern systems is common. A robust solution will often require things no longer available, such as source code, system documentation, APIs, or programmers with experience in a 50-year-old computer system. In such cases, the only feasible solution may be to write a screen scraper which “pretends” to be a user at a terminal. The screen scraper might connect to the legacy system via Telnet, emulate the keystrokes needed to navigate the old user interface, process the resulting display output, extract the desired data, and pass it on to the modern system. (A sophisticated and resilient implementation of this kind, built on a platform providing the governance and control required by a major enterprise—e.g. change control, security, user management, data protection, operational audit, load balancing and queue management, etc.—could be said to be an example of robotic process automation software.)

In the 1980s, financial data providers such as Reuters, Telerate, and Quotron displayed data in 24×80 format intended for a human reader. Users of this data, particularly investment banks, wrote applications to capture and convert this character data as numeric data for inclusion into calculations for trading decisions without re-keying the data. The common term for this practice, especially in the United Kingdom, was page shredding, since the results could be imagined to have passed through a paper shredder. Internally Reuters used the term ‘logicized’ for this conversion process, running a sophisticated computer system on VAX/VMS called the Logicizer.

More modern screen scraping techniques include capturing the bitmap data from the screen and running it through an OCR engine, or for some specialised automated testing systems, matching the screen’s bitmap data against expected results. This can be combined in the case of GUI applications, with querying the graphical controls by programmatically obtaining references to their underlying programming objects. A sequence of screens is automatically captured and converted into a database.

Another modern adaptation to these techniques is to use, instead of a sequence of screens as input, a set of images or PDF files, so there are some overlaps with generic “document scraping” and report mining techniques.

Web scraping

Web pages are built using text-based mark-up languages (HTML and XHTML), and frequently contain a wealth of useful data in text form. However, most web pages are designed for human end-users and not for ease of automated use. Because of this, tool kits that scrape web content were created. A web scraper is an API or tool to extract data from a web site. Companies like Amazon AWS and Google provide web scraping tools, services and public data available free of cost to end users. Newer forms of web scraping involve listening to data feeds from web servers. For example, JSON is commonly used as a transport storage mechanism between the client and the web server.

Recently, companies have developed web scraping systems that rely on using techniques in DOM parsing, computer vision and natural language processing to simulate the human processing that occurs when viewing a webpage to automatically extract useful information.

Large websites usually use defensive algorithms to protect their data from web scrapers and to limit the number of requests an IP or IP network may send. This has caused an ongoing battle between website developers and scraping developers.

Report mining

Report mining is the extraction of data from human readable computer reports. Conventional data extraction requires a connection to a working source system, suitable connectivity standards or an API, and usually complex querying. By using the source system’s standard reporting options, and directing the output to a spool file instead of to a printer, static reports can be generated suitable for offline analysis via report mining. This approach can avoid intensive CPU usage during business hours, can minimise end-user licence costs for ERP customers, and can offer very rapid prototyping and development of custom reports. Whereas data scraping and web scraping involve interacting with dynamic output, report mining involves extracting data from files in a human readable format, such as HTML, PDF, or text. These can be easily generated from almost any system by intercepting the data feed to a printer. This approach can provide a quick and simple route to obtaining data without needing to program an API to the source system.

source

Link building

Spread the love

Link building

In the field of search engine optimization (SEO), link building describes actions aimed at increasing the number and quality of inbound links to a webpage with the goal of increasing the search engine rankings of that page or website. Briefly, link building is the process of establishing relevant hyperlinks (usually called links) to a website from external sites. Link building can increase the number of high-quality links pointing to a website, in turn increasing the likelihood of the website ranking highly in search engine results. Link building is also a proven marketing tactic for increasing brand awareness.

Editorial link

Editorial links are the links not acquired from paying money, asking, trading or exchanging. These links are attracted because of the good content and marketing strategies of a website. These are the links that the website owner does not need to ask for as they are naturally given by other website owners.

Resource link

Resource links are a category of links, which can be either one-way or two-way, usually referenced as “Resources” or “Information” in navbars, but sometimes, especially in the early, less compartmentalized years of the Web, simply called “links”. Basically, they are hyperlinks to a website or a specific webpage containing content believed to be beneficial, useful and relevant to visitors of the site establishing the link.

In recent years, resource links have grown in importance because most major search engines have made it plain that—in Google’s words—”quantity, quality, and relevance of links count towards your rating”.

Search engines measure a website’s value and relevance by analyzing the links to the site from other websites. The resulting “link popularity” is a measure of the number and quality of links to a website. It is an integral part of a website’s ranking in search engines. Search engines examine each of the links to a particular website to determine its value. Although every link to a website is a vote in its favor, not all votes are counted equally. A website with similar subject matter to the website receiving the inbound link carries more weight than an unrelated site, and a well-regarded site (such as a university) has higher link quality than an unknown or disreputable website.

The text of links helps search engines categorize a website. The engines’ insistence on resource links being relevant and beneficial developed because many artificial link building methods were employed solely to spam search engines, i.e. to “fool” the engines’ algorithms into awarding the sites employing these unethical devices undeservedly high page ranks and/or return positions.

Despite Google cautioning site developers to avoid “‘free-for-all’ links, link popularity schemes, or submitting a site to thousands of search engines these are typically useless exercises that don’t affect the ranking a site in the results of the major search engines”, most[which?] major engines have deployed technology designed to “red flag” and potentially penalize sites employing such practices.

Acquired link

These are the links acquired by the website owner through payment or distribution. They are also known as organically obtained links. Such links include link advertisements, paid linking, article distribution, directory links and comments on forums, blogs and other interactive forms of social media.

Reciprocal link

A reciprocal link is a mutual link between two objects, commonly between two websites, to ensure mutual traffic. For example, Alice and Bob have websites. If Bob’s website links to Alice’s website and Alice’s website links to Bob’s website, the websites are reciprocally linked. Website owners often submit their sites to reciprocal link exchange directories in order to achieve higher rankings in the search engines. Reciprocal linking between websites is no longer an important part of the search engine optimization process. In 2005, with their Jagger 2 update, Google stopped giving credit to reciprocal links as it does not indicate genuine link popularity.

Forum signature linking

Forum signature linking is a technique used to build backlinks to a website. This is the process of using forum communities that allow outbound hyperlinks in a member’s signature. This can be a fast method to build up inbound links to a website’s SEO value.

Blog comments

Leaving a comment on a blog can result in a relevant do-follow link to the individual’s website. Most of the time, however, leaving a comment on a blog turns into a no-follow link, which are not counted by search engines, such as Google and Yahoo! Search. On the other hand, blog comments are clicked on by the readers of the blog if the comment is well-thought-out and pertains to the discussion of the post on the blog.

Directory link

Website directories are lists of links to websites which are sorted into categories. Website owners can submit their site to many of these directories. Some directories accept payment for listing in their directory while others are free.

Social bookmarking

Social bookmarking is a way of saving and categorizing web pages in a public location on the web. Because bookmarks have anchor text and are shared and stored publicly, they are scanned by search engine crawlers and have search engine optimization value.

Image linking

Image linking is a way of submitting images, such as infographics, to image directories and linking them back to a specific URL.

Black hat link building

In early incarnations, when Google’s algorithm relied on incoming links as an indicator of website success, Black Hat SEOs manipulated website rankings by creating link-building schemes, such as building subsidiary websites to send links to a primary website. With an abundance of incoming links, the prime website outranked many reputable sites. However, the conflicts of being devalued by major search engines while building links could be caused by web owners using other black hat strategies. Black hat link building refers explicitly to the process of acquiring as many links as possible with minimal effort.

The Penguin algorithm was created to eliminate this type of abuse. At the time, Google clarified its definition of a “bad” link: “Any links intended to manipulate a site’s ranking in Google search results may be considered part of a link scheme.”

With Penguin, it wasn’t the quantity of links that improved your site but the quality. Since then, Google’s web spam team has attempted to prevent the manipulation of their search results through link building. Major brands including J.C. Penney, BMW, Forbes, Overstock.com, and many others have received severe penalties to their search rankings for employing spammy and non-user friendly link building tactics.

In October 5, 2014 Google launched a new algorithm update Penguin 3.0 to penalize those sites who use black hat link building tactics to build unnatural links to manipulate search engines. The update affected 0.3% English Language queries all over the world.

Black hat SEO could also be referred to as Spamdexing, which utilizes other black SEO strategies and link building tactics. Some black hat link building strategies include getting unqualified links from and participating in Link farm, link schemes and Doorway page. Black Hat SEO could also refer to “negative SEO,” the practice of deliberately harming another website’s performance.

White hat link building

White hat link building strategies are those strategies that add value to end users, abide by Google’s term of service and produce good results that could be sustained for a long time. White hat link building strategies focus on producing high-quality as well as relevant links to the website. Although more difficult to acquire, white hat link building tactics are widely implemented by website owners because such kind of strategies are not only beneficial to their websites’ long-term developments but also good to the overall online environment.

Internal link

An internal link is a type of hyperlink on a webpage to another page or resource, such as an image or document, on the same website or domain. Hyperlinks are considered either “external” or “internal” depending on their target or destination. Generally, a link to a page outside the same domain or website is considered external, whereas one that points at another section of the same webpage or to another page of the same website or domain is considered internal.

However, these definitions become clouded when the same organization operates multiple domains functioning as a single web experience, e.g. when a secure commerce website is used for purchasing things displayed on a non-secure website. In these cases, links that are “external” by the above definition can conceivably be classified as “internal” for some purposes. Ultimately, an internal link points to a web page or resource in the same root directory.

Similarly, seemingly “internal” links are in fact “external” for many purposes, for example in the case of linking among subdomains of a main domain, which are not operated by the same person(s). For example, a blogging platform, such as WordPress, Blogger or Tumblr host thousands of different blogs on subdomains, which are entirely unrelated and the authors of which are generally unknown to each other. In these contexts one might view a link as “internal” only if it linked within the same blog, not to other blogs within the same domain.

Both internal and external links allow users of the website to navigate to another web page or resource. This is the basis or founding idea or principal behind the internet. That users can navigate from one resource to another by clicking on hyperlinks. Internal links help users navigate the same website, whereas external links take users to a different website.

Both internal and external links help users surf the internet as well as having Search engine optimization value. Internal linking allows for good website nagivation and structure and allows search engines to crawl or spider websites.

PageRank

PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages. According to Google:
PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

Currently, PageRank is not the only algorithm used by Google to order search results, but it is the first algorithm that was used by the company, and it is the best known

Description

Cartoon illustrating the basic principle of PageRank. The size of each face is proportional to the total size of the other faces which are pointing to it.
PageRank is a link analysis algorithm and it assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of “measuring” its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is referred to as the PageRank of E and denoted by {\displaystyle PR(E).} PR(E). Other factors like Author Rank can contribute to the importance of an entity.

A PageRank results from a mathematical algorithm based on the webgraph, created by all World Wide Web pages as nodes and hyperlinks as edges, taking into consideration authority hubs such as cnn.com or usa.gov. The rank value indicates an importance of a particular page. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it (“incoming links”). A page that is linked to by many pages with high PageRank receives a high rank itself.

Numerous academic papers concerning PageRank have been published since Page and Brin’s original paper. In practice, the PageRank concept may be vulnerable to manipulation. Research has been conducted into identifying falsely influenced PageRank rankings. The goal is to find an effective means of ignoring links from documents with falsely influenced PageRank.

Other link-based ranking algorithms for Web pages include the HITS algorithm invented by Jon Kleinberg (used by Teoma and now Ask.com),the IBM CLEVER project, the TrustRank algorithm and the Hummingbird algorithm.

Generalization of PageRank and eigenvector centrality for ranking objects of two kinds

A generalization of PageRank for the case of ranking two interacting groups of objects was described in  In applications it may be necessary to model systems having objects of two kinds where a weighted relation is defined on object pairs. This leads to considering bipartite graphs. For such graphs two related positive or nonnegative irreducible matrices corresponding to vertex partition sets can be defined. One can compute rankings of objects in both groups as eigenvectors corresponding to the maximal positive eigenvalues of these matrices. Normed eigenvectors exist and are unique by the Perron or Perron-Frobenius theorem. Example: consumers and products. The relation weight is the product consumption rate.

Distributed algorithm for PageRank computation

There are simple and fast random walk-based distributed algorithms for computing PageRank of nodes in a network. They present a simple algorithm that takes {\displaystyle O(\log n/\epsilon )} O(\log n/\epsilon) rounds with high probability on any graph (directed or undirected), where n is the network size and {\displaystyle \epsilon } \epsilon is the reset probability ( {\displaystyle 1-\epsilon } 1-\epsilon is also called as damping factor) used in the PageRank computation. They also present a faster algorithm that takes {\displaystyle O({\sqrt {\log n}}/\epsilon )} O(\sqrt{\log n}/\epsilon) rounds in undirected graphs. Both of the above algorithms are scalable, as each node processes and sends only small (polylogarithmic in n, the network size) number of bits per round.

Google Toolbar

The Google Toolbar long had a PageRank feature which displayed a visited page’s PageRank as a whole number between 0 and 10. The most popular websites displayed a PageRank of 10. The least showed a PageRank of 0. Google has not disclosed the specific method for determining a Toolbar PageRank value, which is to be considered only a rough indication of the value of a website. In March 2016 Google announced it would no longer support this feature, and the underlying API would soon cease to operate.

SERP rank

The search engine results page (SERP) is the actual result returned by a search engine in response to a keyword query. The SERP consists of a list of links to web pages with associated text snippets. The SERP rank of a web page refers to the placement of the corresponding link on the SERP, where higher placement means higher SERP rank. The SERP rank of a web page is a function not only of its PageRank, but of a relatively large and continuously adjusted set of factors (over 200). Search engine optimization (SEO) is aimed at influencing the SERP rank for a website or a set of web pages.

Positioning of a webpage on Google SERPs for a keyword depends on relevance and reputation, also known as authority and popularity. PageRank is Google’s indication of its assessment of the reputation of a webpage: It is non-keyword specific. Google uses a combination of webpage and website authority to determine the overall authority of a webpage competing for a keyword. The PageRank of the HomePage of a website is the best indication Google offers for website authority.

After the introduction of Google Places into the mainstream organic SERP, numerous other factors in addition to PageRank affect ranking a business in Local Business Results.

Google directory PageRank

The Google Directory PageRank was an 8-unit measurement. Unlike the Google Toolbar, which shows a numeric PageRank value upon mouseover of the green bar, the Google Directory only displayed the bar, never the numeric values. Google Directory was closed on July 20, 2011.

False or spoofed PageRank

In the past, the PageRank shown in the Toolbar was easily manipulated. Redirection from one page to another, either via a HTTP 302 response or a “Refresh” meta tag, caused the source page to acquire the PageRank of the destination page. Hence, a new page with PR 0 and no incoming links could have acquired PR 10 by redirecting to the Google home page. This spoofing technique was a known vulnerability. Spoofing can generally be detected by performing a Google search for a source URL; if the URL of an entirely different site is displayed in the results, the latter URL may represent the destination of a redirection.

Manipulating PageRank

For search engine optimization purposes, some companies offer to sell high PageRank links to webmasters. As links from higher-PR pages are believed to be more valuable, they tend to be more expensive. It can be an effective and viable marketing strategy to buy link advertisements on content pages of quality and relevant sites to drive traffic and increase a webmaster’s link popularity. However, Google has publicly warned webmasters that if they are or were discovered to be selling links for the purpose of conferring PageRank and reputation, their links will be devalued (ignored in the calculation of other pages’ PageRanks). The practice of buying and selling links is intensely debated across the Webmaster community. Google advises webmasters to use the nofollow HTML attribute value on sponsored links. According to Matt Cutts, Google is concerned about webmasters who try to game the system, and thereby reduce the quality and relevance of Google search results.

Directed Surfer Model

A more intelligent surfer that probabilistically hops from page to page depending on the content of the pages and query terms the surfer that it is looking for. This model is based on a query-dependent PageRank score of a page which as the name suggests is also a function of query. When given a multiple-term query, Q={q1,q2,…}, the surfer selects a q according to some probability distribution, P(q) and uses that term to guide its behavior for a large number of steps. It then selects another term according to the distribution to determine its behavior, and so on. The resulting distribution over visited web pages is QD-PageRank.

Social components

The PageRank algorithm has major effects on society as it contains a social influence. As opposed to the scientific viewpoint of PageRank as an algorithm the humanities instead view it through a lens examining its social components. In these instances, it is dissected and reviewed not for its technological advancement in the field of search engines, but for its societal influences. Laura Granka discusses PageRank by describing how the pages are not simply ranked via popularity as they contain a reliability that gives them a trustworthy quality. This has led to a development of behavior that is directly linked to PageRank. PageRank is viewed as the definitive rank of products and businesses and thus, can manipulate thinking. The information that is available to individuals is what shapes thinking and ideology and PageRank is the device that displays this information. The results shown are the forum to which information is delivered to the public and these results have a societal impact as they will affect how a person thinks and acts.

Katja Mayer views PageRank as a social network as it connects differing viewpoints and thoughts in a single place. People go to PageRank for information and are flooded with citations of other authors who also have an opinion on the topic. This creates a social aspect where everything can be discussed and collected to provoke thinking. There is a social relationship that exists between PageRank and the people who use it as it is constantly adapting and changing to the shifts in modern society. Viewing the relationship between PageRank and the individual through sociometry allows for an in-depth look at the connection that results.

Matteo Pasquinelli reckons the basis for the belief that PageRank has a social component lies in the idea of attention economy. With attention economy, value is placed on products that receive a greater amount of human attention and the results at the top of the PageRank garner a larger amount of focus then those on subsequent pages. The outcomes with the higher PageRank will therefore enter the human consciousness to a larger extent. These ideas can influence decision-making and the actions of the viewer have a direct relation to the PageRank. They possess a higher potential to attract a user’s attention as their location increases the attention economy attached to the site. With this location they can receive more traffic and their online marketplace will have more purchases. The PageRank of these sites allow them to be trusted and they are able to parlay this trust into increased business.

Other uses

The mathematics of PageRank are entirely general and apply to any graph or network in any domain. Thus, PageRank is now regularly used in bibliometrics, social and information network analysis, and for link prediction and recommendation. It’s even used for systems analysis of road networks, as well as biology, chemistry, neuroscience, and physics.

In neuroscience, the PageRank of a neuron in a neural network has been found to correlate with its relative firing rate.

Personalized PageRank is used by Twitter to present users with other accounts they may wish to follow ceme online

Swiftype’s site search product builds a “PageRank that’s specific to individual websites” by looking at each website’s signals of importance and prioritizing content based on factors such as number of links from the home page.

A version of PageRank has recently been proposed as a replacement for the traditional Institute for Scientific Information (ISI) impact factor, and implemented at Eigenfactor as well as at SCImago. Instead of merely counting total citation to a journal, the “importance” of each citation is determined in a PageRank fashion.

A similar new use of PageRank is to rank academic doctoral programs based on their records of placing their graduates in faculty positions. In PageRank terms, academic departments link to each other by hiring their faculty from each other (and from themselves).

PageRank has been used to rank spaces or streets to predict how many people (pedestrians or vehicles) come to the individual spaces or streets. In lexical semantics it has been used to perform Word Sense Disambiguation, Semantic similarity, and also to automatically rank WordNet synsets according to how strongly they possess a given semantic property, such as positivity or negativity.

In sport the PageRank algorithm has been used to rank the performance of: teams in the National Football League (NFL) in the USA; individual soccer players; and athletes in the Diamond League.

A Web crawler may use PageRank as one of a number of importance metrics it uses to determine which URL to visit during a crawl of the web. One of the early working papers that were used in the creation of Google is Efficient crawling through URL ordering, which discusses the use of a number of different importance metrics to determine how deeply, and how much of a site Google will crawl. PageRank is presented as one of a number of these importance metrics, though there are others listed such as the number of inbound and outbound links for a URL, and the distance from the root directory on a site to the URL.

The PageRank may also be used as a methodology to measure the apparent impact of a community like the Blogosphere on the overall Web itself. This approach uses therefore the PageRank to measure the distribution of attention in reflection of the Scale-free network paradigm.[citation needed]

In any ecosystem, a modified version of PageRank may be used to determine species that are essential to the continuing health of the environment.

For the analysis of protein networks in biology PageRank is also a useful tool.

In 2005, in a pilot study in Pakistan, Structural Deep Democracy, SD2 was used for leadership selection in a sustainable agriculture group called Contact Youth. SD2 uses PageRank for the processing of the transitive proxy votes, with the additional constraints of mandating at least two initial proxies per voter, and all voters are proxy candidates. More complex variants can be built on top of SD2, such as adding specialist proxies and direct votes for specific issues, but SD2 as the underlying umbrella system, mandates that generalist proxies should always be used.

Pagerank has recently been used to quantify the scientific impact of researchers. The underlying citation and collaboration networks are used in conjunction with pagerank algorithm in order to come up with a ranking system for individual publications which propagates to individual authors. The new index known as pagerank-index (Pi) is demonstrated to be fairer compared to h-index in the context of many drawbacks exhibited by h-index.

nofollow

In early 2005, Google implemented a new value, “nofollow”, for the rel attribute of HTML link and anchor elements, so that website developers and bloggers can make links that Google will not consider for the purposes of PageRank—they are links that no longer constitute a “vote” in the PageRank system. The nofollow relationship was added in an attempt to help combat spamdexing.

As an example, people could previously create many message-board posts with links to their website to artificially inflate their PageRank. With the nofollow value, message-board administrators can modify their code to automatically insert “rel=’nofollow'” to all hyperlinks in posts, thus preventing PageRank from being affected by those particular posts. This method of avoidance, however, also has various drawbacks, such as reducing the link value of legitimate comments. (See: Spam in blogs#nofollow)

In an effort to manually control the flow of PageRank among pages within a website, many webmasters practice what is known as PageRank Sculpting—which is the act of strategically placing the nofollow attribute on certain internal links of a website in order to funnel PageRank towards those pages the webmaster deemed most important. This tactic has been used since the inception of the nofollow attribute, but may no longer be effective since Google announced that blocking PageRank transfer with nofollow does not redirect that PageRank to other links.

Deprecation

PageRank was once available for the verified site maintainers through the Google Webmaster Tools interface. However, on October 15, 2009, a Google employee confirmed that the company had removed PageRank from its Webmaster Tools section, saying that “We’ve been telling people for a long time that they shouldn’t focus on PageRank so much. Many site owners seem to think it’s the most important metric for them to track, which is simply not true.” In addition, The PageRank indicator is not available in Google’s own Chrome browser.

The visible page rank is updated very infrequently. It was last updated in November 2013. In October 2014 Matt Cutts announced that another visible pagerank update would not be coming.

Even though “Toolbar” PageRank is less important for SEO purposes, the existence of back-links from more popular websites continues to push a webpage higher up in search rankings.

Google elaborated on the reasons for PageRank deprecation at Q&A #March and announced Links and Content as the Top Ranking Factors, RankBrain was announced as the #3 Ranking Factor in October 2015 so the Top 3 Factors are now confirmed officially by Google.

On April 15, 2016 Google has officially shut down their Google Toolbar PageRank Data to public. Google had declared their intention to remove the PageRank score from the Google toolbar several months earlier. Google will still be using PageRank score when determining how to rank content in search results.

source

Link farm

Spread the love

Link farm

On the World Wide Web, a link farm is any group of web sites that all hyperlink to every other site in the group. In graph theoretic terms, a link farm is a clique. Although some link farms can be created by hand, most are created through automated programs and services. A link farm is a form of spamming the index of a web search engine (sometimes called spamdexing). Other link exchange systems are designed to allow individual websites to selectively exchange links with other relevant websites and are not considered a form of spamdexing.

Search engines require ways to confirm page relevancy. A known method is to examine for one-way links coming directly from relevant websites. The process of building links should not be confused with being listed on link farms, as the latter requires reciprocal return links, which often renders the overall backlink advantage useless. This is due to oscillation, causing confusion over which is the vendor site and which is the promoting site.

History

Link farms were developed by search engine optimizers (SEOs) in 1999 to take advantage of the Inktomi search engine’s dependence upon link popularity. Although link popularity is used by some search engines to help establish a ranking order for search results, the Inktomi engine at the time maintained two indexes. Search results were produced from the primary index which was limited to approximately 100 million listings. Pages with few inbound links fell out of the Inktomi index on a monthly basis.

Inktomi was targeted for manipulation through link farms because it was then used by several independent but popular search engines. Yahoo!, then the most popular search service, also used Inktomi results to supplement its directory search feature. The link farms helped stabilize listings primarily for online business Web sites that had few natural links from larger, more stable sites in the Inktomi index.

Link farm exchanges were at first handled on an informal basis, but several service companies were founded to provide automated registration, categorization, and link page updates to member Web sites.

When the Google search engine became popular, search engine optimizers learned that Google’s ranking algorithm depended in part on a link-weighting scheme called PageRank. Rather than simply count all inbound links equally, the PageRank algorithm determines that some links may be more valuable than others, and therefore assigns them more weight than others. Link farming was adapted to help increase the PageRank of member pages.

However, the link farms became susceptible to manipulation by unscrupulous webmasters who joined the services, received inbound linkage, and then found ways to hide their outbound links or to avoid posting any links on their sites at all. Link farm managers had to implement quality controls and monitor member compliance with their rules to ensure fairness.

Alternative link farm products emerged, particularly link-finding software that identified potential reciprocal link partners, sent them template-based emails offering to exchange links, and created directory-like link pages for Web sites, in the hope of building their link popularity and PageRank. These link farms are sometimes counted in black-hat SEO strategy.

Search engines countered the link farm movement by identifying specific attributes associated with link farm pages and filtering those pages from indexing and search results. In some cases, entire domains were removed from the search engine indexes in order to prevent them from influencing search results.

Blog network

A blog network, also known as a link farm, is a group of blogs that are owned by the same entity. A blog network can either be a group of loosely connected blogs, or a group of blogs that are owned by the same company. The purpose of such a network is usually to promote the other blogs in the same network and therefore increase the search engine rankings or advertising revenue generated from online advertising on the blogs.

In September 2014, Google targeted private blog networks (PBNs) with manual action ranking penalties. This served to dissuade search engine optimization and online marketers from using PBNs to increase their online rankings. The “thin content” warnings are closely tied to Panda which focuses on thin content and on-page quality. PBNs have a history of being targeted by Google and therefore may not be the safest option. Since Google is on the search for blog networks, they are not always linked together. In fact, interlinking your blogs could help Google and a single exposed blog could reveal the whole blog network by looking at the outbound links.

A blog network may also refer to a central website, such as WordPress, where a user creates an account and is then able to use their own blog. The created blog forms part of a network because it uses either a subdomain or a subfolder of the main domain, although in all other ways it can be entirely autonomous. This is also known as a hosted blog platform and usually uses the free WordPress Multisite software.

Hosted blog networks are also known as Web 2.0 networks due to their rise to popularity during the second phase of internet development, known as Web 2.0 when interactive social sites began to rapidly develop.

source

Page hijacking

A hacker may use an exploit framework such as sqlmap to search for SQL vulnerabilities in the database and insert an exploit kit such as MPack in order to compromise legitimate users who visit the now compromised web server. One of the simplest forms of page hijacking involves altering a webpage to contain a malicious inline frame which can allow an exploit kit to load.

Page hijacking is frequently used in tandem with a watering hole attack on corporate entities in order to compromise targets. bandar ceme terpercaya

Cloaking

Cloaking is a search engine optimization (SEO) technique in which the content presented to the search engine spider is different from that presented to the user’s browser. This is done by delivering content based on the IP addresses or the User-Agent HTTP header of the user requesting the page. When a user is identified as a search engine spider, a server-side script delivers a different version of the web page, one that contains content not present on the visible page, or that is present but not searchable. The purpose of cloaking is sometimes to deceive search engines so they display the page when it would not otherwise be displayed (black hat SEO). However, it can also be a functional (though antiquated) technique for informing search engines of content they would not otherwise be able to locate because it is embedded in non-textual containers such as video or certain Adobe Flash components. Since 2006, better methods of accessibility, including progressive enhancement, have been available, so cloaking is no longer necessary for regular SEO.[citation needed]

Cloaking is often used as a spamdexing technique to attempt to sway search engines into giving the site a higher ranking. By the same method, it can also be used to trick search engine users into visiting a site that is substantially different from the search engine description, including delivering pornographic content cloaked within non-pornographic search results.

Cloaking is a form of the doorway page technique.

A similar technique is used on DMOZ web directory, but it differs in several ways from search engine cloaking:

  • It is intended to fool human editors, rather than computer search engine spiders.
  • The decision to cloak or not is often based upon the HTTP referrer, the user agent or the visitor’s IP; but more advanced techniques can be also based upon the client’s behaviour analysis after a few page requests: the raw quantity, the sorting of, and latency between subsequent HTTP requests sent to a website’s pages, plus the presence of a check for robots.txt file, are some of the parameters in which search engines spiders differ heavily from a natural user behaviour. The referrer tells the URL of the page on which a user clicked a link to get to the page. Some cloakers will give the fake page to anyone who comes from a web directory website, since directory editors will usually examine sites by clicking on links that appear on a directory web page. Other cloakers give the fake page to everyone except those coming from a major search engine; this makes it harder to detect cloaking, while not costing them many visitors, since most people find websites by using a search engine.

Black hat perspective

Increasingly, for a page without natural popularity due to compelling or rewarding content to rank well in the search engines, webmasters design pages solely for the search engines. This results in pages with too many keywords and other factors that might be search engine “friendly”, but make the pages difficult for actual visitors to consume. As such, black hat SEO practitioners consider cloaking to be an important technique to allow webmasters to split their efforts and separately target the search engine spiders and human visitors. Cloaking allows user experience to be high while satisfying the necessary minimum keyword concentration to rank in a search engine.

In September 2007, Ralph Tegtmeier and Ed Purkiss coined the term “mosaic cloaking” whereby dynamic pages are constructed as tiles of content and only portions of the pages, JavaScript and CSS are changed, simultaneously decreasing the contrast between the cloaked page and the “friendly” page while increasing the capability for targeted delivery of content to various spiders and human visitors.

Cloaking versus IP delivery

IP delivery can be considered a more benign variation of cloaking, where different content is served based upon the requester’s IP address. With cloaking, search engines and people never see the other’s pages, whereas, with other uses of IP delivery, both search engines and people can see the same pages. This technique is sometimes used by graphics-heavy sites that have little textual content for spiders to analyze.

One use of IP delivery is to determine the requestor’s location, and deliver content specifically written for that country. This isn’t necessarily cloaking. For instance, Google uses IP delivery for AdWords and AdSense advertising programs to target users in different geographic locations.

IP delivery is a crude and unreliable method of determining the language in which to provide content. Many countries and regions are multi-lingual, or the requestor may be a foreign national. A better method of content negotiation is to examine the client’s Accept-Language HTTP header.

As of 2006, many sites have taken up IP delivery to personalise content for their regular customers. Many of the top 1000 sites, including sites like Amazon (amazon.com), actively use IP delivery. None of these have been banned from search engines as their intent is not deceptive.

Doorway page

Doorway pages (bridge pages, portal pages, jump pages, gateway pages or entry pages) are web pages that are created for the deliberate manipulation of search engine indexes (spamdexing). A doorway page will affect the index of a search engine by inserting results for particular phrases while sending visitors to a different page. Doorway pages that redirect visitors without their knowledge use some form of cloaking. This usually falls under Black Hat SEO.

If a visitor clicks through to a typical doorway page from a search engine results page, in most cases they will be redirected with a fast Meta refresh command to another page. Other forms of redirection include use of JavaScript and server side redirection, from the server configuration file. Some doorway pages may be dynamic pages generated by scripting languages such as Perl and PHP.

Doorway pages are often easy to identify in that they have been designed primarily for search engines, not for human beings. Sometimes a doorway page is copied from another high ranking page, but this is likely to cause the search engine to detect the page as a duplicate and exclude it from the search engine listings.

Because many search engines give a penalty for using the META refresh command, some doorway pages just trick the visitor into clicking on a link to get them to the desired destination page, or they use JavaScript for redirection.

More sophisticated doorway pages, called Content Rich Doorways, are designed to gain high placement in search results without using redirection. They incorporate at least a minimum amount of design and navigation similar to the rest of the site to provide a more human-friendly and natural appearance. Visitors are offered standard links as calls to action.

Landing pages are regularly misconstrued to equate to Doorway pages within the literature. The former are content rich pages to which traffic is directed within the context of pay-per-click campaigns and to maximize SEO campaigns.

Doorway pages are also typically used for sites that maintain a blacklist of URLs known to harbor spam, such as Facebook, Tumblr and Deviantart.

Cloaking
Doorway pages often also employ Cloaking techniques for misdirection. Cloaked pages will show a version of that page to human visitor which is different from the one provided to crawlers – usually implemented via server side scripts. The server can differentiate between bots, crawlers and human visitors based on various flags, including source IP address and/or user-agent. Cloaking will simultaneously trick search engines to rank sites higher for irrelevant keywords, while displaying monetizing any human traffic by showing visitors spammy, often irrelevant, content. The practice of cloaking is considered to be highly manipulative and condemned within the SEO industry and by search engines, and its use can result in massive penalty or the complete removal of sites from being indexed

Redirection
Webmasters that use doorway pages would generally prefer that users never actually see these pages and instead be delivered to a “real” page within their sites. To achieve this goal, redirection is sometimes used. This may be as simple as installing a meta refresh tag on the doorway pages. An advanced system might make use of cloaking. In either case, such redirection may make your doorway pages unacceptable to search engines.

Construction
A content rich doorway page must be constructed in a Search engine friendly (SEF) manner, otherwise it may be construed as search engine spam possibly resulting in the page being banned from the index for an undisclosed amount of time.

These types of doorways utilize (but are not limited to) the following:

  • Title Attributed images for key word support
  • Title Attributed links for key word support

In culture

Doorway pages were examined as a cultural and political phenomenon along with spam poetry and flarf.

Keyword stuffing

Keyword stuffing is a search engine optimization (SEO) technique, considered webspam or spamdexing, in which keywords are loaded into a web page’s meta tags, visible content, or backlink anchor text in an attempt to gain an unfair rank advantage in search engines. Keyword stuffing may lead to a website being banned or penalized on major search engines either temporarily or permanently. The repetition of words in meta tags may explain why many search engines no longer use these tags.

Many major search engines have implemented algorithms that recognize keyword stuffing, and reduce or eliminate any unfair search advantage that the tactic may have been intended to gain, and oftentimes they will also penalize, demote or remove websites from their indexes that implement keyword stuffing.

Changes and algorithms specifically intended to penalize or ban sites using keyword stuffing include the Google Florida update (November 2003) Google Panda (February 2011) Google Hummingbird (August 2013) and Bing’s September 2014 update.

History

Keyword stuffing had been used in the past to obtain top search engine rankings and visibility for particular phrases. This method is outdated and adds no value to rankings today. In particular, Google no longer gives good rankings to pages employing this technique.

Hiding text from the visitor is done in many different ways. Text colored to blend with the background, CSS “Z” positioning to place text “behind” an image — and therefore out of view of the visitor — and CSS absolute positioning to have the text positioned far from the page center are all common techniques. By 2005, many invisible text techniques were easily detected by major search engines.

“Noscript” tags are another way to place hidden content within a page. While they are a valid optimization method for displaying an alternative representation of scripted content, they may be abused, since search engines may index content that is invisible to most visitors.

Sometimes inserted text includes words that are frequently searched (such as “sex”), even if those terms bear little connection to the content of a page, in order to attract traffic to advert-driven pages.

In the past, keyword stuffing was considered to be either a white hat or a black hat tactic, depending on the context of the technique, and the opinion of the person judging it. While a great deal of keyword stuffing was employed to aid in spamdexing, which is of little benefit to the user, keyword stuffing in certain circumstances was not intended to skew results in a deceptive manner. Whether the term carries a pejorative or neutral connotation is dependent on whether the practice is used to pollute the results with pages of little relevance, or to direct traffic to a page of relevance that would have otherwise been de-emphasized due to the search engine’s inability to interpret and understand related ideas. This is no longer the case. Search engines now employ themed, related keyword techniques to interpret the intent of the content on a page.

With relevance to keyword stuffing, it is quoted by the largest of search engines[when?] that they recommend Keyword Research and use (with respect to the quality content you have to offer the web), to aid their visitors in the search of your valuable material. Google discusses keyword stuffing as Randomly Repeated Keywords.

In online journalism

Headlines in online news sites are increasingly packed with just the search-friendly keywords that identify the story. Puns and plays on words have gone by the wayside. Overusing this strategy is also called keyword stuffing. Traditional reporters and editors frown on the practice, but it is effective in optimizing news stories for search.