In order to battle off plagiarism and scraper sites, and also to provide higher quality search results, the Google index is applying a filter to sort out duplicates of web pages and other documents found on the web. The URLs that are judged to point to content that can be found on another URL as well, are being lowered in their importance, and eventually are turned into supplemental results or are dropped out of the index.
Known issues
Case 1,
The now obsolete practice of having a backup copy of a web page or an
entire web site ( a.k.a. mirror sites, hosted on a different server,
under a different domain name ), parallel to the one that is intended
to be the "original" will trigger the applying of this filter.
+ Resolution: The immediate shutdown of the mirror site, and all copies
of the content you have control of. Redirect visitors to the single
copy that you wish to keep.
Case 2,
In certain instances where the URL history, the crawl rate or pattern,
PageRank, directory level or the TrustRank of the new copy suggests
that the new web page is the one with higher importance, the "original"
URL will be marked as the supplemental result, or dropped out of the
index.
+ Resolution: You should not have an identical copy of any single web
page, nor an entire web site on the web simultaneously to the original.
In case you notice your web pages being plagiarized by a 3rd party,
contact the webmaster and request its deletion. If the webmaster does
not respond, contact the hosting company, the Internet Service
Provider, or the Registrar directly, and report the problem to Google
representatives through the Google Webmaster Tools control panel.
Case 3,
Sometimes a single web page can be accessed through multiple URLs,
resulting in the presumption of two identical copies of the same
content existing in the index. The algorithm will then most likely
judge either to be the duplicate, and set its attributes in the
database accordingly. In certain cases, where the URL that is presumed
to be the original by the webmaster can not be identified as so by
Google, or the multiple URL pattern is being perceived as spam, both or
all URLs will be marked as supplemental, or be dropped from the index.
+ Resolution: Google does its best to identify the patterns of
good-faith duplicate content issues, such as the www.example.com vs.
the example.com versions of the same URL pointing to a single web page.
In certain cases however the algorithm can not decide whether the
duplicate content is spam, the result of erroneous inbound links or of
inconsistent navigation / parameters for the same URL.
For more information on how to resolve this issue, see Canonical URLs.
Case 4,
In extremely rare cases a proxy server or a hacked website may cache
web pages or entire websites, and knowingly or by chance allow Google
to index its pages. Sometimes Google may not be able to determine the
original source of the content, and keep the URLs of proxy in its
Index, instead of the URLs of the website being copied. This issue is a
problem that Google engineers are currently working on resolving.
+ Resolution: To prevent such issues taking websites by surprise, you
may set up a
Google Alert at http://www.google.com/alerts for the
domain name and inspect reports of any suspicious URLs that use its
domain name as a part of the address, or bits of its unique content.
Either way, you will need to identify the bot that requests the pages
from the website and disallow any further copying of the content
through your .htaccess settings. Read more on Hijacking.
Article Source: http://diagnostics.googlerankings.com/duplicate-content-in-google.html