White Paper: Statistical Research of Blockchain Data with GPU Hardware Acceleration
Download this White Paper in PDF
Abstract: Use of GPU Rasters for Blockchain Research
Bitcoin Mining ASIC hardware is presently the leader for coin-creation per unit cost, time, and electrical power. Many core Graphic Processor Units (GPU’s) are used as hardware acceleration to mine digital currency like Ethereum, and much is openly published by individuals and suppliers. This paper assumes the reader already understands mining basics and financial transactions tracking via hash methods. This paper covers the use of Commercial-Off-the-Shelf (COTS) IC’s, specifically GPU’s less published concepts and research of the GPU’s input and output video raster heads applied to blockchain data research.
These GPU rasters are phase locked and phase shifted to a small resolution of one pixel clock to act as an additional layer of hardware acceleration, for mining or transaction data history research. This same method is also used for Blockchain history research on sections of data similar to video compression macro-blocks (macrocells). GPU Raster methods are quite different from the “many-cores” GPU method of acceleration, and have different benefits to the processing of data.
Presently, the best productivity devices and methods for mining new currency, after having costly hardware of ASICs or GPU cards, is the method to mine in a pool of cooperating members. The pool of miners work as a team, and some teams share detailed and private technical acceleration hardware information to enhance mining. However not much hardware detail is published for studying the wider subject of security, fraud past and prevention, and weaknesses in any blockchain operations.
There is likely no magic bullet to win digital currency by keeping billions of history nonces and their corresponding “finger-print” hashes. However, there is the possibility that rather than the popular nonce-to-hash equation, patterns can be potentially found by statistical analyses of the history of nonce’s to hashes and other blockchain header data, over an extended period. Also a cryptocurrency, Monero, can potentially be tracked or traced with statistical data characteristics using macroblock methods of video compression that could apply multiple interconnected GPU rasters.
There is an entirely separate application of phase locked GPUs for validation of monetary transactions on the blockchain, especially when those transactions have a large number of confirmations When a human or a computer program notices something suspicious has occurred a significant amount of possible history data needs to be revisited. In many cases, even if the transaction is found to be criminal, often nothing can be done about it past-tense. However, phase locked GPUs can be used to find historical patterns in known fraudulent behavior and detect similar, new transactions being made in real time.
Why Commercial-Off-the-Shelf (COTS) Graphic Processor Units instead of FPGAs?
GPU’s versus Field Programmable Gate Arrays (FPGAs) is a comparison, of:
- Development time
- Development cost
- Best overall accelerator processing time
- Speed of raster
- Cost of Production Runs
- Ability to Update Fielded Equipment
GPU’s of the 1990’s years began a contest in the competitive marketplace with the fastest hardware acceleration components such as Blitter’s (block line transfer), YUV-to-RGB/RGB-to-YUV converters, Motion Compensation, Polygon-Fills, and Boolean Logic Operations on Pixels, that other classes of chips barely pursued.
GPU’s also made their one of their most advertised functions to be the highest resolution displays at the highest frame rate possible. This required large and fast synchronous address counters, and very wide, very fast data bus’s and pixel data FIFO’s. GPU’s square mm, for square mm, have more engineering man-hours in them than almost all other chips, save just a few high dollar price x86 processors from AMD and Intel, and the highest-end ARM processor variants.
These GPU’s chips are partially hand laid out, for best possible grouping of components and lengths of crucial data and address wires. And with its sheer raw speed of boolean data processing, typical 4 MB HD-video frame continuous periodic memory reads and writes, and macro-block processing, the GPU is at the top of a class of digital IC’s for overall value.
Conversely, FPGAs are often used as the central LSI chip, or the only LSI chip, in custom complex circuits that include raster generation. However VHDL or Verilog Register Transfer Logic (RTL), gates and D-flops, in an FPGA will be 3 times less efficient as a custom GPU IC for moving raster data.
The most common acceleration methods of GPU’s are the “many-cores” methods of parallel processing. One of the typical uses is in video games to have many small cores with their own local cashe memory, compute the filled polygon, and then copy it to the main viewing raster buffer.
https://www.researchgate.net/publication/303093384_A_Parallel_Scheme_for_Large-scale_Polygon_Rasterization_on_CUDA-enabled_GPUs_Large-scale_Polygon_Rasterization_on_CUDA-enabled_GPUs
“many-core GPUs, which make massive parallel computing power accessible for general public. The other improvement is the development of Big Data technologies, e.g., MapReduce [9] and its open source implementation Apache Hadoop2, which allows using simple computing models to process large-scale datasets on distributed computing systems without deep knowledge in parallel and distributed computing. However, these platforms are primarily designed for relational data and may not be efficient or even suitable for spatial data“
Likewise typical GPU multi-core processing is intended for the many cores to operate on many small blocks of local chase data like hash-solutions, or polygon fills copied to a large raster when done, versus operating singularly on the whole raster, as does the “raster display scan engine” and “blitter”’s.
https://www.rastergrid.com/blog/gpu-tech/2021/01/understanding-gpu-caches/
“multiple GPU cores, as the number of work items a program is executed on in a typical GPU workload is even larger than what can fit on a single GPU core (e.g. a full-screen pass over a full HD image results in over 2 million thread invocations)”
Subsystems Inside Typical X86 GPU IC’s
The core of the GPU begins with Raster Display Scan Engines of exceptionally fast Synchronous Counters that create video Memory Addresses, to read Pixel Data, to fill the display FIFO’s, as the root core of its periodic frame refresh function.Output and Input Rasters have the highest Priority for memory functions.This dates back to CRT displays, which require perfect repetitious timing.HDMI and DisplayPort also require exact timing or will logically disconnect thus the required high memory priority.
Significant Manufacturers of X86 GPU IC’s & Arm MALI-GPU IC’s in Rigid Markets
The big three X86 GPU suppliers today are Nvidia, Intel and AMD (was ATI in the 1990’s)
Intel, is one of the world’s most prolific producer-manufacturers of GPU’s (built into their X86 processor companion chips). For more generic information about GPUs are the following links.
https://www.intel-dot-com/content/www/us/en/products/docs/processors/what-is-a-gpu.html
Next link may be an Intel inc., reprint.. (not obvious who is the author firm, Intel or “hellotech” )
https://www.hellotech-dot-com/blog/whats-a-gpu-what-gpu-do-you-have and https://www.howtogeek-dot-com/702766/what-is-a-gpu-graphics-processing-units-explained/
Wikipedia, is more detailed with information on raster, pixel and memory. It also describes motion compensation, Blitter, bitmap manipulation, line draw, and area fill functions.
A number of websites publish lists of GPU IC chip manufacturers, both fabless and fab-owning firms. Some of those lists include firms that sell only IP sections of GPU’s, not whole IC’s ready for matacture that function as a GPU as a PCIe bus peripheral or PCIe bus host.
Below is a list of X86 peripheral video controller GPU chip manufacturer firms. (in November 2021).
note some of the X86 GPU makers are also selling licensed ARM Processors with MALI-GPUs that are a mix of MALI and original video IP circuits. However, they are still best known for their flagship, X86 PCIx peripheral GPU’s that command the highest prices in the gaming and coin mining end-user markets.
- AMD (X86 bus)https://www.amd.com/en
- Silicon Motion Inc (X86)Silicon Motion Inc has two links here, as it is often left off of published GPU lists. Below is a marketing page, and a PDF of a PCIe X86 bus GPU IC, that can be obtained by large or small firms and universities.
https://www.siliconmotion.com/product/Graphics-Display-SoCs.html
https://www.siliconmotion.com/download/3PT/a/SM768_WP_4K_High_Definition_EN_201910.pdf - Intel (X86 bus)Intel is coming back (so they say, in December2021), as an independent GPU X86 peripheral chip maker/card manufacturer. At present Intel is making many millions of GPU’s, however they are built into the intel-only processor companion chips.
See drawing below: Typical X86 and ARM GPU Product in Year 2021
This Blockchain-GPU White-paper project was unable to obtain any Intel video graphics card, with its Intel GPU IC’s. up through December 2021. Some 3rd party magazines seem to have some information about Intel GPU cards in the year 2021.
https://arstechnica.com/gadgets/2021/08/intels-arc-gpus-will-compete-with-geforce-and-radeon-in-early-2022/
http://www.yjfy.com/collection/collection-0020.htm - Arm (not X86 bus)ARM is the exception GPU supplier firm that should be named, but is not manufacturing (fabless or fab-owning) IC chips. Their IP of complete, essentially ready to “Fabricate” Processor-GPU chips are manufactured and sold by numerous private, and public-traded fabless and fab-owning firms, under that licensed product name “ARM”.
And PCIe host-bus ARM-GPU makers (below)
https://www.arm.com/products/silicon-ip-multimedia
Qualcomm, Marvel, ST, NXP, Renesas, Atmel, Broadcomm, Samsung, Apple (apparently for internal sales only), Texas Instruments, Microchip, Freescale, ST Microelectronics, (and more) are just some of these many firms making ARM-IP into production chips, as exact IP copies. Many with a mix of upgrade, and cost reduction modifications, that have the “MALI” GPU-IP (intellectual product), embedded in the processor chip, to the ARM processors. But rarely with significant changes, such as PCIe Peripheral Bus MALI only configuration, or with raster external timing or clock control.
In some cases, the MALI GPU will take up more silicon space and burn more watt’s, however most of the ARM-MALI chip variants are marketed as Processors-with-embedded-video. And prices of the high end ARM-MALI chips can be as high as some of the X86 processors, often components in the most expensive cell phones.
If we have missed your firm, or have data that is not fully correct, our apologies and please let us know to update this paper. This list mostly refers to IC-chips that are in-house designs, not-re-branded, or licensed versions, of the short list of GPU makers above. PowerVR is not listed as aGPU IC-Chip maker, physical parts sold under that name PowerVR can’t be found. PowerVR is IP, in the Intel-GPU, and may also be as some sub-portions of other firms GPU’s, as GPU’s contain an assortment of accelerator circuits (see the Typical GPU details drawing).
Current GPU IC Markets, & New Opportunities in Blockchain Research
In early December 2021, Nvidia was not given the Securities and Exchange Commission’s (SEC) permission to purchase ARM inc. That such a further rigidity and top-heaviness toward a full monopolization of the GPU market, says so much of how politics is controlling markets. It is unfortunate to have to mention this in a technical white paper. However these effects are stifling R&D, and this paper covers attempts at some technical work-arounds.
The only moderately open markets for processors and GPU’s are licensable cores from ARM inc, and the new public domain RISC-V (but still in its infancy, with high-feature-set GPU).
This White paper recommends that a new fabless firm market a high-end GPU (possibly a MALI GPU core, PowerVR core or the Silicon Motion core, as a peripheral X86 PCI-e bus IC, or as a ARM processor, that can be set up as a host or peripheral PCIe but always with a 802.3ap Ethernet port, that is ready for function when a tiny linux is loaded. This new GPU core containing chips, needs raster clock and pixel clock input control for all heads, raster sync signals. These pins are easily added as optional function on the GPIO pins, and create functions for many industries such as:
- Database searching and processing
- Improved blockchain mining, where more GPU cards can be gang in more scalable groups as mining block as sizes grow
- Higher-end video and more scalable video games and flight simulators, that can gang GPU rasters for more split screen or overlays or large detailed sprites.
- Ability to process macro-blocks of data for moving objects, originating from camera or Lidar data, where processed data can feed from raster to raster, for real-time vehicle driving with improved safety.
Multiple Versions of GPU Pixel Comparison & Macro-Block Circuits Application Methods
The complex modern COTS (commercial off the shelf) LSI digital IC’s with so many millions of transistors, multiple buses and ports, and memory management methods of DRAM and register SRAM, and local caches, and address counters, intrinsically create many use cases.
Herein, we will review how to better synchronize and phase lock multiple GPU’s of X86-perifocal-bus types and ARM-MALI-GPU host types, into temporarily locked circuits to faster process Blockchain data. Not computing hashes from nonces, but rather the use of rasters and boolean logic to create Blockchain search hardware accelerator engines.
Three Use Cases, Multiple Phase Locked or Phase Shifted GPU Data Review
First use for Blockchain fraud investigation, as based on studies of the full-data of large sets of historical transactions or nonce-and-hash combinations’, or transaction-hash’s going back as far as the hard disk space and comparison acceleration hardware can provide for. Or do the same for, at least on random blocks of full data of historical transactions. This may be a solution worth testing, as a review of transaction history, to find historical fraud. Links below on Blockchain fraud issues
https://www.maltego.com/blog/bitcoin-forensics-with-maltego-and-ciphertrace-blockchain-intelligence/
https://www.raconteur.net/technology/blockchain/blockchain-fraud-prevention/
https://www.makeuseof.com/ways-fraud-possible-on-blockchain/
A second use of this Disk-Space and Data-Compare Acceleration-Hardware (DSDCAH) that is looking for data of nonce’s and hashes for changes or-differences, as if a nefarious party has altered historical data. Further the historical statistics can be compared to new real-time transactions patterns or even on-going coin mining or transaction patterns, for Fraud Prevention.
A third use of mining and transaction attribute research – whereas the disk-space and comparison acceleration hardware is used to use transaction data blocks or nonce’s, or blocks of nonce’s to create not hash’s that are sensitive to even the alteration of just one data bit, of many, but rather Characteristic-Attribute-Macro-Blocks, (CAMB) that are similar in nature to video compression blocks. Whereas much of the same acceleration hardware can be used, to make data-characteristic indexes of these macro-block’s, and groups of blocks in sets of ever larger groups.
If these groups of CAMB indexes of these macro-block’s indexes are stored, and then compared to real-time patterns, then suspicious real-time activity can potentially be blocked For review.
1st Case, Fraud Investigation, by Searching Transaction &/or Mining History’s
To use this Disk-Space and Data-Compare Acceleration-Hardware (DSDCAH) that is looking at the headers, data blocks, nonce’s and hashes for changes-differences from distributed data store location, to others. All known data about the blockchains that can be gathered is used, to include time, dates, IP-address geo-locations, suspected use of Proxy IP address’ etc, and types of items purchased with transactions.
This research data searching then builds a database of suspect transaction mining users, or financial transactions, by finding gatherings of common data parameters occurring, or the reverse,that which is extremely uncommon. All this History-Study-Result-Database (HSRD) that gets created is much smaller than the original data. Small does not mean a nonce, but just smaller, of data sets that are deemed most of interest. For now, assume the HSRD is a 1920×1080-16bitpixel video frame.
2nd Case, Compare History-Result-Database (HSRD) to Real-time Blockchain
This 2nd method, seeks to Compare History-Result-Database (HSRD) to the Real-time Blockchain Data Stream. GPU’s History-Study-Result-Database (HSRD) can be a mix of data such as blockchain headers, data blocks, nonce’s and hashes for changes-differences from distributed data stored location, to others.
The data is compared to on-going real-time transactions or coin mining or transactions patterns, for Fraud Prevention.
This too, also requires both temporal frame locked, and controlled frame shifting, as to search for matches and difference of all data combinations. A Typical single small miner, on a network of 40 Mbit/sec download speeds, if running full blast for incoming data, presents approximately 4Mbytes/sec of data that may need to be search against a possible HD frame of 1920x1080x(16 bit pixels) = (A) 4,147,200 bytes. against an example BitCoin block max size (year 2017) of (B) 4 Mbytes.
https://bitcoinmagazine.com/guides/what-is-the-bitcoin-block-size-limit
….Bitcoin blocks now have a theoretical maximum size of 4 megabytes and a more realistic maximum size of 2 megabytes. …The exact size depends on the types of transactions included.
The two data items coincidentally line up, fairly well, of a 4Mbyte to 4Mbyte comparison of data as 16 bit words of the math result of 4M frames of data. If the GPU raster heads run at the common frame rate of 60Hz, translates to 18.5 hours of a single GPU all-data-words compared to all-data-words studied real-time-miner or transactor, 1-second of operations. 18.5 hours (55800 seconds) search to 1 second of real-time transactor data flow, is an impossibly bad ratio to find real-time fraud..
The two ways to speed this up, are to add more GPU heads, or to reduce the set of History-Study-Result-Database (HSRD) to a smaller video frame. Thus it is the wharf work done in use-case-1, that pays off. to make a smaller database of suspect comparison data. GPU raters can easily be set-up for small rasters. However, if it is too small, then common COTS displays wont lock on the the small fast fram signals, so the research will get a back-screen or “no-connection”.
So stay with the popular HD Fram, of 1920×1080, if the HSRD set is redacted to a mere 1920 16 bit word pixels, and this is loaded in the raster, shifted one pixel, on each load, where each 1920 pixels take up 1080 lines (rows) of display. With the frame compare system now able to compare the smaller 1920 word (HSRD) sets roughly 1080 times faster against the real-time data flow, the full search is reduced to 52 seconds
However 26 phase control dual output head, GPU cards, X86 or AMR-MALI types, where phase-offsets are staggered, then , in effect the search is processed by 52 video heads simultaneous, with the result being, 1 second of search time, to process 1 second of miner or debit-card block chain transaction data.
Conservative HD frame resolutions at 60Hz frame rates are used that are applicable to common ARM-MALI units. At about 1.5×2 inches per Arm card, and with 802.3ap connecting, the cards themselves could be powered by POE (power over Ethernet) and thus it is a very easy to maintain research or field use machine.
3rd case, Mining & Transaction Attribute MacroBlock Research
Characteristic-Attribute-Macro-Blocks, (CAMB) that are similar in nature to video compression blocks. Whereas much of the same acceleration hardware can be tuned to these larger data sets. As it is, a DVD video compressed movie is mostly macro-blocks of typically 6.2 Gigabytes of data, for Standard Resolution.
But unlike movie video compression, it is suggested here that all simultaneous macro-block-ing of a movie would be compared to all other movie-video-frame’s macro-blocks, as to create a large dat abase of macro-blocks. As such the compression world of movies would be large, but the payoff would be lower bandwidth on networks to send movies, as so many macro-blocks get re-used. The average DVD player can easily have a gig-byte of SDRAM, making the progressive storage, as the movie plates, to not be a problem.
Likewise billions of macroblocks of compression of attributes of bitcoin mining history can be done. Recap: attributes… not hash’s. A has is an exact fingerprint like number, to identify a large block of data. An attribute macro block, of an example video image chunk of 16 x 16 pixels, which is 256 pixels uncompressed, and be shrunk in a LOSSY concept to attribute of color, brightness, increasing or decreasing of color or brightness as a cosine function, across the width and height of the video macroblock. And thus try to get that macro block attribute identifier squeezed down to a lossy 8 bytes or less, rather than the 256 original bytes.
BlockChain Transaction Coin Mining History Data structures
More data can be searched faster, and cheaper than any comparative classic methods such as SQL and similar data bases of variable, and highly configurable mixed records.
Rather the fixed frame sizes of GPU raters, and multiple phase locked and temporally phase shifted rasters, can be taken advantage of for more varied hardware acceleration.
Pre-Formatting Financial Transactions, Date’s , IP-Addr, Nonce’s, & Hash History Data
GPU frames, also known as video-pixel-raster-frames used for this data work, can be coincidentally sized to match common-popular video displays. This is useful not just for the engineers during the testing of this system for basic function, but also to note if any visual cues stand out, for new research methods on the data. This also allows for the use of COTS GPU IC’s and displays that can serve dual purposes with other uses at different times at a facility.
This reduces costs, and provides an improved case for distributed, multi-purpose, and re-purposeful hardware units at small buildings where heat is needed in cold climates, and electricity is relatively cheap.
Drawing: “Fixed Record Size Search, Multiple Records Make Video Frame”
- The data can include geographic zones of IP-addresses, ID of Miners, Transactor Entities,
- Transaction details: date-time, location, firms, products transacted for
Whereas GPU hardware acceleration would potentially provide a way to look at so much history data.
Blockchain Accelerator GPU-FPGA Research (BAGR)
FPGA’s, COTS-X86-Peripheral-GPUs, & ARM-Host-MALI-GPUs mixed circuits for now are work-arounds to accomplish GPU Blockchain data processing Research.
The typical FPGA development time, to working prototype that does some processing tasks that can be witnessed with software or hardware, will result earlier with FPGA’s. This is because the FPGA suppliers like Lattice, Intel-Altera, and Xilinx are generally very cooperative with researchers and product developers. A problem with FPGA’s is when moving large data rasters and running boolean logic on the data streams, it could require a $1000 dollar FPGA to compete with a $100 dollar GPU. This Scientific/Commercial hardware BlockChain Accelerator GPU-FPGA Research (BAGR) can still be done, but producing thousands of units for researchers worldwide would be costly.
The large dollar difference stems from the custom designed GPU’s can achieve dramatically higher clock speeds in their core synchronous counters that generate memory addresses. And the output address bus in the GPU, has superb, semi-manual chip layout to traverse to the SDRAM address pins, with the least delay.
If the same FPGA only (no GPU), based equipment is put into daily use to earn its own cost back through work, then expect higher wattage consumption per boolean expression clocked to the results, over the same function, compared to the FPGA-COTS-GPU combination boards.
COTS-GPUs can be used for the same Scientific/Commercial hardware BlockChain Accelerator GPU-FPGA Research (BAGR), as the FPGAs However it can be very tricky to obtain cooperation with “the big three” Intel, AMD, Nvidia X86-GPU IC suppliers. Silicon Motion is the only company, supplying X86 GPU’s that are reasonably small, however their present year 2021 production GPU’s now lack needed control pins of direct pixel clock inputs and vertical sync output as TTL or LVDS for scientific and or commercial GPU research.
Another method of improving GPU performance with hardware BAGR is by using SLI, NVlink and Crossfire (physically top of card) interconnects. For example The new Nvidia NVLink, and the older COTS-GPU cards that can still be easily found and purchased with SLI (Scalable Link Interface) and AMD Crossfire connectors that re-open the door for research projects on GPU’s to small independent parties. However, still this is not an easy (but still feasible development path). Generally only the board partners of Nvidia, Intel and AMD have access to the schematics of NVLink, SLI and Crossfire connectors. Intel so far makes little known about multi-card hardware configurations as of this paper publishing.
It can be easily argued that a near monopoly is in play with the big three USA corporate based X86-GPU suppliers. Change may be on the way with the new GPU firms starting up in China where competition and free markets are more active.
Software development kits for gaming on multi-card COTS-GPUs can in some cases be used for some of this development work. A major issue, is getting access to the raster data flows going in and out of the GPU rasters, in hard logic synchronous buses, rather than asynchronous PCI-express ports,
More at this time can be done with the high end ARM-MALI COTS GPU’s although again most still lacking control pins for Scientific/Commercial BlockChain GPU Hardware Accelerator Research (BHGAR). However, this is expected to change in the year 2022, with the new supplier from China coming on line. Interestly, these new supplies will probably be at 2X to 3X transistors size larger than the USA-corp based firms, but are probably more open to new partnerships, and thus more new end-user and research products.
For now, for both university based, commercial and government research on GPU based board level hardware, some of the limitations of lack of cooperation of the from GPU supplies can be overcome with a mix of, types (A) X86 Nvidia Cross-Connected GPU Cards and (B) AMD X86 Cross-Connected GPU Cards. X86 PCI-express peripheral card devices obviously can transfer data faster than ARM-host card devices, to and from the motherboard.
Phase Lock/Shifting of GPU’s Temporal Frame Timing ( X86 & ARM-MALI)
However, the intensive data processing is happening in the GPU’s either head-to-head in a multi-head GPU, or head-in-GPU-x, to other head-in-GPU-y, via direct wired connections, with reduced packetizing overhead, with matching temporal relationship of data frames. In some cases, it is directly parallel video frame-pixel data, and can even match timing of popular displays, for technician/engineer monitoring and testing.
The matching or purposeful phase shifting of the video data frames (used for blockchain data frames) allows for vast amounts of data compared at GPU data throughput rates (the best per financial-cost or wattage or board space), but with less or no power hungry and costly large pixel FIFO’s. Said large FIFO’s would mostly not be able to run as fast as the GPU rasters, and yet further have slowing effects on the research set-up.
Phase locking and phase shifting can be done with free Mimax Inc methods of pixel clock subtraction.
See patent PDF download links US6262695 Method and apparatus for phase-locking a plurality of display devices and multi-level driver for use therewith
X86 Nvidia & AMD Cross-Connected Multiple GPU Cards on PCIe bus
Multiple X86 GPU cards of the reviewed NVLink connector or SLI connector on COTS cards (appearing more synchronous) than Crossfire connector on COTS cards (appearing more asynchronous) , can be mounted on shared common motherboards at universities for non commercial purposes and to benefit public freedom of knowledge and education.
Running the multiple GPU cards in video frame lock. line-lock or pixel phase lock, or with purposeful phase shift as much as possible, is to achieve coherent data control as desired for the different types of data compare searches of the blockchains. “Coherent” in this case could be a metaphor of 500 taxicabs (similar to the frame or macroblock data groups) that are needing to move from the center of city-A, to 100 kilometers distant center of city-B. If the taxies have only single lane roads (similar to a group of GPU rasters) to use, and many intersections (similar to memory reads and writes and boolean computation and FIFO bottlenecks) to passthrough, and if they can all be equally spaced at a constant 50 meters apart, and all travel together at the same exact 100 km/hour, then the 500 taxis will arrive sooner.
The coherent sooner time of completion of the overall data search analysis, may be by about 1.3x to 2.0x faster, than if the group was “incoherent” with multiple driven taxis (similar to rasters with random temporal associations) , with only approximate starting times, and approximate distances apart, and approximant same speeds of travel. When the temporally coherent group of hardware GPU rasters, colette a group raster search, they can all sequence together for the next group search, which could be at the exact same time, to need new data blocks to work on, or staggered, ie, phase shifted, as for best use of a single data lane to the hard disk.
NVlink Multi-GPU Card Hardware Review for BlockChain Data Research Application
The manner that NVLink functions is much more complex than the earlier GPU-Card to GPU-card connector system named Scalable Link Interface (SLI)
Although several IC’s and transistors are on NVLink PCB’s they appear to be only LED light controls, however it is suspected that Nvidia’s use of a small microcontroller, does more than blink LED’s for status of NVLINK function, but may also adds another layer of copyright messages, that move between the cards, and interact with the Nvidia video driver software.
The proprietary nature of NVLink causes universities, national laboratories, smaller firms, and open source programmers to have a further reduced ability to create new methods for using the hardware, or new scientific study with this hardware system. Some details about NVLinks design have been published by open-information sources’ such as:
https://en.wikipedia.org/wiki/NVLink
https://en.wikichip.org/wiki/nvidia/nvlink
https://en.wikichip.org/wiki/nvidia/nvlink#Links
https://www.gpumag.com/nvidia-sli-and-compatible-cards/
Below the attributed authors detail more of the NVlink packet, however it is doubtful that without partnership with the GPU manufacturer that it is possible to modify the operations of this NVlink multi-card system restricted operating modes.
Overall, NVLINK appears to be a hybrid of a switched video-bus fabric (aka mesh-network), whereas accurate timing control in the nano-second range allows for split screen rasters to be re-assembled and directly sent to the display, and also a logical network, that afforded changing of memory attributes of data types, word/pixel sizes and destination memory zones or destination video FIFO’s.
Without Nvidia Inc cooperation with universities and developers , it can be difficult to create new methods of blockchining mining or blockchain history data research tools that would connect hardware at the NVLink ports of GPU cards. Electronics economy of yesteryear in the 1950’s to 1980’s thrived and advanced rapidly often due to the democratization of access to current technologies equipments for improvement. At present inception, and overly complex and copyrighted digital signatures, close off a good deal of independent R&D and open research.
SLI Multi-GPU Blockchain Search Accelerator Video-Frame Feedback Loop Concepts
The SLI connector is a potential video bus (or many types of streaming data) entry/exit point of where to focus value-added hardware acceleration. Sync signals allow the researcher to gain data access easily to pixels, scan lines, partial or full video frames.
SLI is the earlier Nvidia product of cross connecting GPU cards to gang the final output for more resolution or frame rate. Nvidia publishing indicates that SLI was acquired from 3dfx Inc in the buyout of the firm. One of the critical factors in the function of cross connecting GPU cards, is nano-second accurate timing of the switch fabric video bus of pixel data that moves from card to card.
The SLI pixel data pipeline-bus, which moves one card’s video raster output stream to the next card, allowing for the end result of faster frame rate displays, or higher resolution displays.
The SLI connector functions more like a high speed switch fabric of LVDS pixel data, where a slave GPU-card computes up the next (every-other) frame pixels, or by splitting the final frame into an upper and lower set of lines, that are assigned to two different, but typically idental GPU cards. “Slave” in this case means the slave GPU card’s pixel clock and phase shifted Verical sync, are copied from the master GPU card.
Open-source sites discuss that the SLI system can experience some “visual artifacts”, if timing is not perfect, such as the slave or master card not completing its rendering of the next frame logical portion, update in time sync of the other card. This is not to be confused with the raster pixel data, or line pixel data, from the slave not being in time-sync to the master card nanosecond accuracy.
https://en.linkfang.org/wiki/Scalable_Link_Interface
https://en-academic.com/dic.nsf/enwiki/545196
https://www.gpumag.com/nvidia-sli-and-compatible-cards/
https://linustechtips.com/topic/365570-sli-scan-line-interleave-scalable-link-interface-what-happened/
However there appear to be SLI GPU card modes that can be done in research settings that can enable new uses for data searching on changing pixel data.
A possible SLI hardware acceleration application of data-byte difference detection, can be visualized easier with moving/changing pixel images, rather than with blockchain data. See next drawing: X86 Dual Head GPU with SLI bridge connector used for Video Feedback Loop, with XOR comparison on raster outputs.
See patent PDF download links, and white paper for more details Pclk subtraction and gated clock methods for phase locking.
US8139072 Network hardware graphics adapter compression
US8441493 Network hardware graphics adapter compression
https://mimax.com/optimizing-embedded-hardware-using-interleave-memory-phase-locking/
https://www.nxp.com/docs/en/nxp/data-sheets/IMXRT1060CEC.pdf
Using XOR boolean difference-data method, similar to video compression 1st stage pixel difference test method, for blockchain data search, at GPU data flow rates for lowest cost, lowest wattage, highest speed search over billions of bytes.
Control over timing of the video frame feedback loop can be accomplished in the manner in a video compression patent from MiMax Inc. LVDS signals are the typically preferred method to move 100MHz pixel changing streams over distances of 15 cm.
https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=LVDS
https://www.ti.com/lit/an/slla325c/slla325c.pdf
“How to Bridge HDMI/DVI to LVDS/OLDI”
AMD Crossfire adds another mesh of PCI-express lanes.
SLI is a hard wired synchronous pixel bus, where data appears more like an LVDS laptop video display uncompressed stream to its flip-up-panel-display, and like a TMDS-DVI video cable data stream, that matches display timing. Crossfire, rather it is a logical asynchronous bus of PCI-express bus. As such, this paper for now, won’t get into crossfire, as this is essentially the same as simply adding more LVDS PCI-express lanes to the main motherboard-connector of the card. See Links below for more serial bus design fundamentals.
https://www.embedded.com/using-pci-express-as-a-fabric-for-interconnect-clustering/
“Using PCI Express as a fabric for interconnect clustering”
Phase Lock/Shifted Multiple GPU’s Search Acceleration on Blockchain Data
Blockchain was a marvelous development of high reliability, distributed processing for financial transactions and coin mining that is very difficult, but not impossible to defraud. Only an intensive process of data with the best methods can review so much data history to prevent fraud, and systematically study how well Blockchain works.
Typically fraud is not committed via a theoretical mathematics method of cracking a blockchain, but rather the more common methods of stolen ID, and infecting computers of innocent users of a system ,and then used for theft. The most sinister theft’s being those thefts that are not ever discovered, or discovered far too late to correct.
What GPU’s oddly still lack was the pre-made, or published methods of how to phase lock the GPU rasters into groups of rasters working together with best timing of data flows for searching vast amounts of disk stored or real-time blockchain data blocks for matches and differences, or creating attribute index blocks (similar to video compression macro-blocks). This paper reviews some of these methods applied Blockchain transaction history and coin mining history data review.
Other White Paper’s further showing examples of microprocessors and video raster engines, that are synchronised and phase locked, for improved data processing are shown below.
https://mimax.com/optimizing-embedded-hardware-using-interleave-memory-phase-locking/
https://mimax.com/automotive-display-synchronization-with-phase-locking/
Multi-Arm-MALI-GPU Blockchain Data Search Hardware Accelerator Concepts
Use of multiple ARM-MALI GPU’s are more easily available IC’s, that default boot up as Host Bus PCI-express. FPGA circuits contain the clock subtractor phase locker (free) ARM’s IC’s ability to act as PCI-express bus in systems of host-to-host communication has been around for some time.
IEEE 802.3ap Ethernet Backplane is one of the most common methods to achieve this interconnectivity fast in the research lab, or even in a portable fielded end-user product.
Multiple ARM-MALI GPU typically wont boot default as an I/O pin PCI-express peripheral. The 10Gbit Ethernet solution can alternatively be sued. Also, 1Mbit EThernet will typical suffice, as the the most active MALI ports are LVDS video-in, and LVDS video out. In many designs the classic TTL 18 bit 666 RGB bus with syncs makes for easy to connect of video raster port to video raster port. However if more than just a 10 cm distance of connections is needed, the TTL wires will top out at about 120MHz clock rates.
LVDS consumes less power, even though wire counts per bit are doubled, due to the small sing in voltage for a logical 1 and 0. However it is still best to have dedicated LVDS channels for sync signals for the phase locking process.
Next drawing shows 4 units of ARM-MALI In Miax of all in exact phase, or phase shifted rasters processing for search finds. Not many ARM’s, PIC’s,, 8051, 6502, RISCV processors with GPU are offered with PCLK-in.
PCLK in, and video syncs-OUT were common in previous years GPU’s. This GPU uses-case is a market mostly unserved. It is probable that the new USA Detrid based Automakers announced their own FAB-less IC designs to adopt more of these pinout features. It is possible to phase lock and phase shift piston engines, in the same manner as video rasters. These GPU wires/nets are in the chips, it is mostly just a matter of assigning GPIO (general purpose I/O) pins for this optional use.
Pclk and sync connections to the FPGA pins complete the phase locking process. Note not all micro-processor with GPU will have Pclk (pixel clock) inputs, or even V-sync outputs. Some ARM-MALI IC’s at least have the “gated clock” command that can respond to GPIO pins to gain some minimal (rough) video raster semi-phase locking.
Below links/Reference-Text for 10Gbit backplane connections.
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.638.2752&rep=rep1&type=pdf
“ InfiniBand: The “De Facto” Future Standard for System and Local Area Networks or Just a Scalable Replacement for PCI Buses?InfiniBand was designed to solve a set of server I/O problems, with extended support for low-latency inter-processor communication also included….
…It is likely that InfiniBand will be deployed primarily within the processing subsystem initially;
but, depending on market forces, it could possibly be used within the storage subsystem as well.”
https://web.cse.ohio-state.edu/~panda.2/788/papers/1h_pro10gbe_lr_sa_wp.pdf
(Intel Inc) 10 Gigabit Ethernet Technology Overview
“ 10 Gigabit Ethernet is the natural evolution of the well-established IEEE 802.3* standard in speed and distance.”https://en.wikipedia.org/wiki/Host_(network)
https://www.anandtech.com/show/9851/hostindependent-pcie-compute-where-were-going-we-dont-need-nodes
https://pcisig.com/sites/default/files/files/PCI_Express_Basics_Background.pdf
https://ieeexplore.ieee.org/document/4039913
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.445.7969&rep=rep1&type=pdf
https://www.molex.com/mx_upload/family/gbx_itrac_backplane_connector_system/10gbase-kr.html
https://www.embedded.com/using-pci-express-as-a-fabric-for-interconnect-clustering/
See patent PDF download links. US6262695 Method and apparatus for phase-locking a plurality of display devices and multi-level driver for use therewith
This also translates to the best ability to compare data quickly for the amount of heat and wattage and circuit board space.
And last, what can be of the most importance, is because these IC “chips” are manufactured in such high quantities, and often duplicated at multiple silicon chip fab’s, that they are able to compare vast amounts of data for matches and differences and to create macro-blocks for video compression which makes them the most efficient for financial cost per data byte processed.
Multi-ARM-MALI-GPU’s Can Achieve Galvanic Ground Isolation Card to Card
Compared to PCI – express bus, 802.3ap backplanes are known for ruggedness, and repair-ability by replacement daughterboard units with non-selective static electricity, as does the PCI-e bus suffer from. Additionally 802.3ap backplane phy-data connections can maintain galvanic ground isolation, from card to card, furthering the ruggedness. If such units also use galvanic isolated power supplies on the cards, it is possible to achieve complete galvanic isolation, ARM-MALI GPU card to ARM-MALI GPU card. The last step of galvanic isolating being the pixel frame outputs of the ARM-MALI-GPU multiple card-units to the FPGA. By using TMDS rather than LVDS, it is possible to use ethernet phyc transformers as TMDS is a balanced system of binary ones and zeros, and there are also optical connectors.
GPU Blockchain MacroBlock (MB) Data Search Hardware Accelerator Concepts
First select some or all of the Blockchain header to be parts of the chosen fixed size video frame. Later this can be expanded to the ever changing blockchains. Frame sizes are made with a number of GPU programming methods of video drivers, or in MS-windows, a VXD can get access to the GPU control registers. GPU’s streaming raster ports (also set up with the GPU registers) linked with FPGA’s added to the hardware as acceleration circuits versus CPU based software.
For example, at the node level of coin-mining, the Bitcoin node is exposed to many peer-to-peer IP addresses and can be a high value target for fraud. For the purposes of blockchain metadata collection and analysis a node should be of higher than average quality computer hardware.
Bitcoin uses an 80 byte header and a 4 byte nonce. If the IP address that the miner is taking to was IPv4 of 4 bytes, or a whole group of miners that may be 1800 bytes, along with date, time, making an uncompressed characteristic-attribute-macro-block (CAMB), with a custom index of 151 bytes, creates a 1024 bytes example shown as:
Using video compression methods of MacroBlock (MB) creation to track vast numbers of transaction headers, IP-addresses, nonces, etc, rather than hashing the data, but rather semi-subjective CAMB compression could be set up, like this proposed GPU solution of macroblock creation.This can be lab-tested in research environments with standard displays, if the overall data frames match standard resolutions. Below is an example of an XGA 1024 x 768 16-bits per pixel GPU setup that can do this.
Data storage of blockchain history data or real-time data, MB’s are processed in search of attribute patterns of data-characteristics of these subsections of blockchain data. New Macro-blocks of blockchain CAMB can then be stored for further analysis. This can be applied not just for Bitcoin or Ethereum or common debit-charge-card transactions, but also the new cryptocurrency; Monero (XMR), where a much greater degree of anonymity of transactions is built into its design, that can potentially be statistically analyzed.
Applying an old axiom that is still true, where encrypted data is mixed with unencrypted data, even if only IP-addresses, then a potential pathway is created for statistical analysis to lead to an improved understanding of the encrypted data. This can be performed by creating a very large set of Characteristic-Attribute-Macro-Blocks (CAMB)’s with blockchain metadata and then that set can be compared to real-time (CAMB) data stream searching for the rare CAMB match or near match attribute.
As the bit depth of the characteristics can be adjusted real-time, then it’s possible to make a search more or less sensitive.
Additionally as phase lock shifting of multiple rasters are done, one data macro-block raster at a time, the equivalent of scrolling or panning , then all blocks can be compared to other blocks at hardware level speeds. Adding data-characteristic indexes to the completed macro-block’s,allows for later manager of the long term storage and re-use of Characteristic-Attribute-Macro-Blocks, (CAMB), as larger, and more effective searching of fraudulent or suspected new transactions
Whereas the disk-space and comparison acceleration hardware that uses hashes, only exact perfect bit for bit matches for fraud search can be done. This process looks for similar data events that are not exact matches.
Characteristic-Attribute-Macro-Blocks, (CAMB) that are similar in nature to video compression blocks. Whereas much of the same acceleration hardware can be used, to make data-characteristic indexes of these macro-block’s, and groups of blocks in sets of ever larger groups.
Recap: If these groups of CAMB and associated indexes are stored, and then compared to real-time patterns, then suspicious real-time activity can potentially be blocked For review at real-time speeds due to use of GPU raster search methods.
Some Applicable Patents and Free IP:
There are numerous patents in the GPU and Bitcoin Mining, Ethereum mining, and BlockChain Transaction industry. There are also firms springing up of patent attorneys which made for the sole purpose of assisting the licensing of the many applicable patents for a mix of related subjects of Blockchain, Video Rasters, Video Compression and a new addition of GPUs used for vehicle automated driving.
This paper lists just a few that the Mimax team can help with. The Mimax IP of phase locking period digital engine down to the nano-second, is free. These engines can be used in multiple areas – from video rasters (like in GPU’s), to multiple computer processors (CPU’s) to piston engines. Please see these additional white papers.
Most of the methods described in this paper required only the phase locking technology from Mimax, and that is free.
Patents list:
https://mimax.com/ (scroll to bottom of page for download links)
Additional White Papers for:
https://mimax.com/automotive-display-synchronization-with-phase-locking/
https://mimax.com/optimizing-embedded-hardware-using-interleave-memory-phase-locking/
Conclusion
GPU and graphics technology is mostly proprietary, and is becoming more proprietary.
A core problem, not discussed much in the open source area of ideas on the wide subject to Blockchain hardware acceleration, is information held private by more proprietary entities. Specifically, what is the ratio of cost of circuits, disks, electricity and latency to the solution of a digital coin win, or for successful (value added) transaction data history research, for various hardware acceleration methods. Most small miners using GPU’s and ASCI based systems barely make any kind of profit.
The largest profit makers appear to be those who sell the hardware acceleration tools. It may be time for more independence and openness in blockchain hardware acceleration technology, of Verilog and VHDL RTL code and application of COTS IC interconnect methods.
More white papers are needed from many levels of society that work on issues of open idea’s before they can be patented, but rather shared openly. This also allows more small new fabless IC entities to come into business. Also helped is the education system, especially the university engineering departments, that need senior and grad student projects of potential application for products. In China and Russia, India, and a number more in Asia, product reverse engineering at the universities is the norm. It should be considered for the western universities also.
More enthusiastic smaller blockchain hardware owners may also begin independent research of data patterns occurring in the peer-to-peer blockchain communities. Overall more lab research should be pursued to create more variants of GPU/FPGA projects that can potentially add new methods to fraud prevention in all blockchain processes.
Both the use of multiple phase locked GPU’s to speed up searching on vast data storage libraries, and the proposed macroblock characteristic attribute method, may be a manner to open up a deeper view into all blockchain transaction processes real-time and for history searches. This may be especially helpful in research of the new, anonymous “manero” digital currency transactions.
Index of Internet links
https://www.leidenlawblog.nl/articles/trademarks-in-works-of-art-are-allowed
https://www.rastergrid.com/blog/gpu-tech/2021/01/understanding-gpu-caches/
https://www.intel-dot-com/content/www/us/en/products/docs/processors/what-is-a-gpu.html
https://www.hellotech-dot-com/blog/whats-a-gpu-what-gpu-do-you-have
https://www.howtogeek-dot-com/702766/what-is-a-gpu-graphics-processing-units-explained/
https://en.wikipedia.org/wiki/Graphics_processing_unit
https://www.siliconmotion.com/product/Graphics-Display-SoCs.html
https://www.siliconmotion.com/download/3PT/a/SM768_WP_4K_High_Definition_EN_201910.pdf
http://www.yjfy.com/collection/collection-0020.htm
https://www.arm.com/products/silicon-ip-multimedia
https://www.maltego.com/blog/bitcoin-forensics-with-maltego-and-ciphertrace-blockchain-intelligence/
https://www.raconteur.net/technology/blockchain/blockchain-fraud-prevention/
https://www.makeuseof.com/ways-fraud-possible-on-blockchain/
https://bitcoinmagazine.com/guides/what-is-the-bitcoin-block-size-limit
https://en.wikipedia.org/wiki/NVLink
https://en.wikichip.org/wiki/nvidia/nvlink
https://en.wikichip.org/wiki/nvidia/nvlink#Links
https://www.gpumag.com/nvidia-sli-and-compatible-cards/
https://en.linkfang.org/wiki/Scalable_Link_Interface
https://en-academic.com/dic.nsf/enwiki/545196
https://www.gpumag.com/nvidia-sli-and-compatible-cards/
https://mimax.com/ See patent PDF download links (scroll to bottom of MiMimax home page for direct links)
US8139072 Network hardware graphics adapter compression
US8441493 Network hardware graphics adapter compression
https://mimax.com/optimizing-embedded-hardware-using-interleave-memory-phase-locking/
https://www.nxp.com/docs/en/nxp/data-sheets/IMXRT1060CEC.pdf
https://ieeexplore.ieee.org/document/6046234
https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=LVDS
https://www.ti.com/lit/an/slla325c/slla325c.pdf
https://www.embedded.com/using-pci-express-as-a-fabric-for-interconnect-clustering/
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.638.2752&rep=rep1&type=pdf
https://web.cse.ohio-state.edu/~panda.2/788/papers/1h_pro10gbe_lr_sa_wp.pdf
https://en.wikipedia.org/wiki/Host_(network)
https://www.anandtech.com/show/9851/hostindependent-pcie-compute-where-were-going-we-dont-need-nodes
https://pcisig.com/sites/default/files/files/PCI_Express_Basics_Background.pdf
https://ieeexplore.ieee.org/document/4039913
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.445.7969&rep=rep1&type=pdf
https://www.molex.com/mx_upload/family/gbx_itrac_backplane_connector_system/10gbase-kr.html
https://www.embedded.com/using-pci-express-as-a-fabric-for-interconnect-clustering/
MiMax team patents:
- US6262695 Method and apparatus for phase-locking a plurality of display devices and multi-level driver for use therewith
- US8139072 Network hardware graphics adapter compression
- US8441493 Network hardware graphics adapter compression
- US10499072 Macros cell display compression multi-head raster GPU
- US Patent Applied For No. 63/125,310 Object Macrocells in Frame Locked Rasters for Real-Time Driving, Positive Train Control and General Purpose Anti-Collision Provisional Patent Filed
Patent direct links https://mimax.com/
https://patents.google.com/patent/US6262695B1/en?oq=US6262695
https://patents.google.com/patent/US8139072B2/en?oq=8139072
https://patents.google.com/patent/US8441493B2/en?oq=US8441493
https://patents.google.com/patent/US10499072B2/en?oq=US10499072
https://mimax.com/download/us-prov-app-object-macrocell-frame-lock-rasters-real-time-driving-postive-train-control-anti-collision-2020-pdf/
White Papers at MiMax Inc https://mimax.com/
https://mimax.com/automotive-display-synchronization-with-phase-locking/
https://mimax.com/optimizing-embedded-hardware-using-interleave-memory-phase-locking/